There are at least six consumers of hotplug_memory_notifier that what they
really are interested in is whether any numa node changed its state, e.g:
going from having memory to not having memory and vice versa.
Implement a specific notifier for numa nodes when their state gets
changed, which will later be used by those consumers that are only
interested in numa node state changes.
Add documentation as well.
[dan.carpenter@linaro.org: set failure reason in offline_pages()]
Link: https://lkml.kernel.org/r/be4fd31b-7d09-46b0-8329-6d0464ffa7a5@sabinyo.mountain
Link: https://lkml.kernel.org/r/20250616135158.450136-4-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix a coding mistake in a previous fix related to system suspend
and hibernation merged recently.
-----BEGIN PGP SIGNATURE-----
iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmhw2k4SHHJqd0Byand5
c29ja2kubmV0AAoJEO5fvZ0v1OO1RMYH/2qgZ/ocUAh8SL1y4rhj19O5J5ihZtZr
Joh3zzUirX6bhxlmP9NHtLNBFNSma72rI2WUDPz9tA6RxfY/lV9CssXCIYN/w8YU
xsso4X0cOulxTvR0hiS6DcXXtFg1X/OgV6w+Pv5t1pLvnsIcCRJtGFfGU909kSKV
yap6DMbSiV3WC8B03Az3B6OUFBTaCuvt1ghs2I9F8O4b6/WuUrriYIAb3/MgWqQl
YA/STieJWUo/hgpIvC09x/Raf8cztIqEi98DCADzhU43wZNR/t8ahlOnlIPK/ujh
TnJHZdpVJUDS62reqaZHnQPxGnfltOPAqlWoM1azo8SCeouhEb1fOXM=
=TocQ
-----END PGP SIGNATURE-----
Merge tag 'pm-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a coding mistake in a previous fix related to system suspend and
hibernation merged recently"
* tag 'pm-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: Call pm_restore_gfp_mask() after dpm_resume()
dev_pm_ops.thaw() is called in following cases:
* normal case: after hibernation image has been created.
* error case 1: creation of a hibernation image has failed.
* error case 2: restoration from a hibernation image has failed.
For normal case, it is called mainly for resume storage devices for
saving the hibernation image. Other devices that are not involved
in the image saving do not need to resume the device. But since there's
no api to know which case thaw() is called, device drivers can't
conditionally resume device in thaw().
The new pm_hibernate_is_recovering() is such a api to query if thaw() is
called in normal case.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20250710062313.3226149-5-guoqing.zhang@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
The register_one_node() function was a simple wrapper around
__register_one_node(). To simplify the code, register_one_node() has been
removed, and __register_one_node() has been renamed to
register_one_node().
Link: https://lkml.kernel.org/r/8262cd0f44eeb048a1fcd3ac8382760d7f7dea60.1748452242.git.donettom@linux.ibm.com
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The function register_memory_blocks_under_node() is now only called from
the memory hotplug path, as register_memory_blocks_under_node_early()
handles registration during early boot. Therefore, the context argument
used to differentiate between early boot and hotplug is no longer needed
and was removed.
Since the function is only called from the hotplug path, we renamed
register_memory_blocks_under_node() to
register_memory_blocks_under_node_hotplug()
Link: https://lkml.kernel.org/r/907c22292b0ee4975107876efc875c75c11badd9.1748452242.git.donettom@linux.ibm.com
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The function register_mem_block_under_node_early() is no longer used, as
register_memory_blocks_under_node_early() now handles memory block
registration during early boot.
Removed register_mem_block_under_node_early() and get_nid_for_pfn(), the
latter was only used by the former.
Link: https://lkml.kernel.org/r/22e0c5d20f1d33a91d0436ad22d96628cf084d1b.1748452242.git.donettom@linux.ibm.com
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "drivers/base/node.c: optimization and cleanups", v7.
This patch (of 7)
During node device initialization, `memory blocks` are registered under
each NUMA node. The `memory blocks` to be registered are identified using
the node's start and end PFNs, which are obtained from the node's pg_data
However, not all PFNs within this range necessarily belong to the same
node—some may belong to other nodes. Additionally, due to the
discontiguous nature of physical memory, certain sections within a `memory
block` may be absent.
As a result, `memory blocks` that fall between a node's start and end PFNs
may span across multiple nodes, and some sections within those blocks may
be missing. `Memory blocks` have a fixed size, which is architecture
dependent.
Due to these considerations, the memory block registration is currently
performed as follows:
for_each_online_node(nid):
start_pfn = pgdat->node_start_pfn;
end_pfn = pgdat->node_start_pfn + node_spanned_pages;
for_each_memory_block_between(PFN_PHYS(start_pfn), PFN_PHYS(end_pfn))
mem_blk = memory_block_id(pfn_to_section_nr(pfn));
pfn_mb_start=section_nr_to_pfn(mem_blk->start_section_nr)
pfn_mb_end = pfn_start + memory_block_pfns - 1
for (pfn = pfn_mb_start; pfn < pfn_mb_end; pfn++):
if (get_nid_for_pfn(pfn) != nid):
continue;
else
do_register_memory_block_under_node(nid, mem_blk,
MEMINIT_EARLY);
Here, we derive the start and end PFNs from the node's pg_data, then
determine the memory blocks that may belong to the node. For each `memory
block` in this range, we inspect all PFNs it contains and check their
associated NUMA node ID. If a PFN within the block matches the current
node, the memory block is registered under that node.
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, get_nid_for_pfn() performs
a binary search in the `memblock regions` to determine the NUMA node ID
for a given PFN. If it is not enabled, the node ID is retrieved directly
from the struct page.
On large systems, this process can become time-consuming, especially since
we iterate over each `memory block` and all PFNs within it until a match
is found. When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, the
additional overhead of the binary search increases the execution time
significantly, potentially leading to soft lockups during boot.
In this patch, we iterate over `memblock region` to identify the `memory
blocks` that belong to the current NUMA node. `memblock regions` are
contiguous memory ranges, each associated with a single NUMA node, and
they do not span across multiple nodes.
for_each_memory_region(r): // r => region
if (!node_online(r->nid)):
continue;
else
for_each_memory_block_between(r->base, r->base + r->size - 1):
do_register_memory_block_under_node(r->nid, mem_blk, MEMINIT_EARLY);
We iterate over all memblock regions, and if the node associated with the
region is online, we calculate the start and end memory blocks based on
the region's start and end PFNs. We then register all the memory blocks
within that range under the region node.
Test Results on My system with 32TB RAM
=======================================
1. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled.
Without this patch
------------------
Startup finished in 1min 16.528s (kernel)
With this patch
---------------
Startup finished in 17.236s (kernel) - 78% Improvement
2. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT disabled.
Without this patch
------------------
Startup finished in 28.320s (kernel)
With this patch
---------------
Startup finished in 15.621s (kernel) - 46% Improvement
[donettom@linux.ibm.com: restore removed extra line]
Link: https://lkml.kernel.org/r/20250609140354.467908-1-donettom@linux.ibm.com
Link: https://lkml.kernel.org/r/2a0a05c2dffc62a742bf1dd030098be4ce99be28.1748452241.git.donettom@linux.ibm.com
Link: https://lkml.kernel.org/r/2a0a05c2dffc62a742bf1dd030098be4ce99be28.1748452241.git.donettom@linux.ibm.com
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 12ffc3b151 ("PM: Restrict swap use to later in the suspend
sequence") changed two pm_restore_gfp_mask() calls in enter_state()
and hibernation_restore() into one pm_restore_gfp_mask() call in
dpm_resume_end(), but it put that call before the dpm_resume()
invocation which is too early (some swap-backing devices may not be
ready at that point).
Moreover, this code ordering change was not even mentioned in the
changelog of the commit mentioned above.
Address this by moving that call after the dpm_resume() one.
Fixes: 12ffc3b151 ("PM: Restrict swap use to later in the suspend sequence")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/2797018.mvXUDI8C0e@rjwysocki.net
pointless in ->read()/->write() of file_operations used only via
debugfs_create_file()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20250702211602.GC3406663@ZenIV
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It has turned out get_dev_from_fwnode() is useful at a few other places
outside of the driver core, as in gpiolib.c for example. Therefore let's
make it available as a common helper function.
Suggested-by: Saravana Kannan <saravanak@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Hiago De Franco <hiago.franco@toradex.com> # Colibri iMX8X
Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> # TI AM62A,Xilinx ZynqMP ZCU106
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20250701114733.636510-18-ulf.hansson@linaro.org
TSA are new aspeculative side channel attacks related to the execution
timing of instructions under specific microarchitectural conditions. In
some cases, an attacker may be able to use this timing information to
infer data from other contexts, resulting in information leakage.
Add the usual controls of the mitigation and integrate it into the
existing speculation bugs infrastructure in the kernel.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhSsvQACgkQEsHwGGHe
VUrWNw//V+ZabYq3Nnvh4jEe6Altobnpn8bOIWmcBx6I3xuuArb9bLqcbKerDIcC
POVVW6zrdNigDe/U4aqaJXE7qCRX55uTYbhp8OLH0zzqX3Pjl/hUnEXWtMtlXj/G
CIM5mqjqEFp5JRGXetdjjuvjG1IPf+CbjKqj2WXbi//T6F3LiAFxkzdUhd+clBF/
ztWchjwUmqU0WJd6+Smb8ZnvWrLoZuOFldjhFad820B7fqkdJhzjHMmwBHJKUEZu
oABv8B0/4IALrx6LenCspWS4OuTOGG7DKyIgzitByXygXXb4L3ZUKpuqkxBU7hFx
bscwtOP7e5HIYAekx6ZSLZoZpYQXr1iH0aRGrjwapi3ASIpUwI0UA9ck2PdGo0IY
0GvmN0vbybskewBQyG819BM+DCau5pOLWuL7cYmaD2eTNoOHOknMDNlO8VzXqJxa
NnignSuEWFm2vNV1FXEav2YbVjlanV6JleiPDGBe5Xd9dnxZTvg9HuP2NkYio4dZ
mb/kEU/kTcN8nWh0Q96tX45kmj0vCbBgrSQkmUpyAugp38n69D1tp3ii9D/hyQFH
hKGcFC9m+rYVx1NLyAxhTGxaEqF801d5Qawwud8HsnQudTpCdSXD9fcBg9aCbWEa
FymtDpIeUQrFAjDpVEp6Syh3odKvLXsGEzL+DVvqKDuA8r6DxFo=
=2cLl
-----END PGP SIGNATURE-----
Merge tag 'tsa_x86_bugs_for_6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU speculation fixes from Borislav Petkov:
"Add the mitigation logic for Transient Scheduler Attacks (TSA)
TSA are new aspeculative side channel attacks related to the execution
timing of instructions under specific microarchitectural conditions.
In some cases, an attacker may be able to use this timing information
to infer data from other contexts, resulting in information leakage.
Add the usual controls of the mitigation and integrate it into the
existing speculation bugs infrastructure in the kernel"
* tag 'tsa_x86_bugs_for_6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/process: Move the buffer clearing before MONITOR
x86/microcode/AMD: Add TSA microcode SHAs
KVM: SVM: Advertise TSA CPUID bits to guests
x86/bugs: Add a Transient Scheduler Attacks mitigation
x86/bugs: Rename MDS machinery to something more generic
On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}),
clocks are managed through PM domains. These PM domains, registered on
behalf of the clock controller driver, are configured with
GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the
clocks are enabled/disabled using runtime PM APIs. The power domains may
also have power_on/power_off support implemented. After the device PM
domain is powered off any CPU accesses to these domains leads to system
aborts.
During probe, devices are attached to the PM domain controlling their
clocks and power. Similarly, during removal, devices are detached from the
PM domain.
The detachment call stack is as follows:
device_driver_detach() ->
device_release_driver_internal() ->
__device_release_driver() ->
device_remove() ->
platform_remove() ->
dev_pm_domain_detach()
During driver unbind, after the device is detached from its PM domain,
the device_unbind_cleanup() function is called, which subsequently
invokes devres_release_all(). This function handles devres resource
cleanup.
If runtime PM is enabled in driver probe via devm_pm_runtime_enable(),
the cleanup process triggers the action or reset function for disabling
runtime PM. This function is pm_runtime_disable_action(), which leads
to the following call stack of interest when called:
pm_runtime_disable_action() ->
pm_runtime_dont_use_autosuspend() ->
__pm_runtime_use_autosuspend() ->
update_autosuspend() ->
rpm_idle()
The rpm_idle() function attempts to resume the device at runtime.
However, at the point it is called, the device is no longer part of a PM
domain (which manages clocks and power states). If the driver implements
its own runtime PM APIs for specific functionalities - such as the
rzg2l_adc driver - while also relying on the power domain subsystem for
power management, rpm_idle() will invoke the driver's runtime PM API.
However, since the device is no longer part of a PM domain at this point,
the PM domain's runtime PM APIs will not be called. This leads to system
aborts on Renesas SoCs.
Another identified case is when a subsystem performs various cleanups
using device_unbind_cleanup(), calling driver-specific APIs in the
process. A known example is the thermal subsystem, which may call driver-
specific APIs to disable the thermal device. The relevant call stack in
this case is:
device_driver_detach() ->
device_release_driver_internal() ->
device_unbind_cleanup() ->
devres_release_all() ->
devm_thermal_of_zone_release() ->
thermal_zone_device_disable() ->
thermal_zone_device_set_mode() ->
struct thermal_zone_device_ops::change_mode()
At the moment the driver-specific change_mode() API is called, the
device is no longer part of its PM domain. Accessing its registers
without proper power management leads to system aborts.
Drop the call to dev_pm_domain_detach() from the platform bus remove
function and rely on the newly introduced call in device_unbind_cleanup().
This ensures the same effect, but the call now occurs after all
driver-specific devres resources have been freed.
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20250703112708.1621607-4-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The dev_pm_domain_attach() function is typically used in bus code
alongside dev_pm_domain_detach(), often following patterns like:
static int bus_probe(struct device *_dev)
{
struct bus_driver *drv = to_bus_driver(dev->driver);
struct bus_device *dev = to_bus_device(_dev);
int ret;
// ...
ret = dev_pm_domain_attach(_dev, true);
if (ret)
return ret;
if (drv->probe)
ret = drv->probe(dev);
// ...
}
static void bus_remove(struct device *_dev)
{
struct bus_driver *drv = to_bus_driver(dev->driver);
struct bus_device *dev = to_bus_device(_dev);
if (drv->remove)
drv->remove(dev);
dev_pm_domain_detach(_dev);
}
When the driver's probe function uses devres-managed resources that
depend on the power domain state, those resources are released later
during device_unbind_cleanup().
Releasing devres-managed resources that depend on the power domain state
after detaching the device from its PM domain can cause failures.
For example, if the driver uses devm_pm_runtime_enable() in its probe
function, and the device's clocks are managed by the PM domain, then
during removal the runtime PM is disabled in device_unbind_cleanup()
after the clocks have been removed from the PM domain. It may happen
that the devm_pm_runtime_enable() action causes the device to be runtime-
resumed. If the driver specific runtime PM APIs access registers directly,
this will lead to accessing device registers without clocks being enabled.
Similar issues may occur with other devres actions that access device
registers.
Add detach_power_off member to struct dev_pm_info, to be used
later in device_unbind_cleanup() as the power_off argument for
dev_pm_domain_detach(). This is a preparatory step toward removing
dev_pm_domain_detach() calls from bus remove functions. Since the current
PM domain detach functions (genpd_dev_pm_detach() and acpi_dev_pm_detach())
already set dev->pm_domain = NULL, there should be no issues with bus
drivers that still call dev_pm_domain_detach() in their remove functions.
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20250703112708.1621607-3-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Calling dev_pm_domain_attach()/dev_pm_domain_detach() in bus driver
probe/remove functions can affect system behavior when the drivers
attached to the bus use devres-managed resources. Since devres actions
may need to access device registers, calling dev_pm_domain_detach() too
early, i.e., before these actions complete, can cause failures on some
systems. One such example is Renesas RZ/G3S SoC-based platforms.
If the device clocks are managed via PM domains, invoking
dev_pm_domain_detach() in the bus driver's remove function removes the
device's clocks from the PM domain, preventing any subsequent
pm_runtime_resume*() calls from enabling those clocks.
The second argument of dev_pm_domain_attach() specifies whether the PM
domain should be powered on during attachment. Likewise, the second
argument of dev_pm_domain_detach() indicates whether the domain should be
powered off during detachment.
Upcoming changes address the issue described above (initially for the
platform bus only) by deferring the call to dev_pm_domain_detach() until
after devres_release_all() in device_unbind_cleanup(). The detach_power_off
field in struct dev_pm_info stores the detach power off info from the
second argument of dev_pm_domain_attach().
Because there are cases where the device's PM domain power-on/off behavior
must be conditional (e.g., in i2c_device_probe()), the patch introduces
PD_FLAG_ATTACH_POWER_ON and PD_FLAG_DETACH_POWER_OFF flags to be passed
to dev_pm_domain_attach().
Finally, dev_pm_domain_attach() and its users are updated to use the newly
introduced PD_FLAG_ATTACH_POWER_ON and PD_FLAG_DETACH_POWER_OFF macros.
This change is preparatory.
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com> # I2C
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20250703112708.1621607-2-claudiu.beznea.uj@bp.renesas.com
[ rjw: Changelog adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
pointless in ->read()/->write() of file_operations used only via
debugfs_create_file()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://patch.msgid.link/20250702211602.GC3406663@ZenIV
Signed-off-by: Mark Brown <broonie@kernel.org>
Add a new flag, called strict_midlayer, to struct dev_pm_info, along
with helper functions for updating and reading its value, to allow
middle layer code that provides proper callbacks for device suspend-
resume during system-wide PM transitions to let pm_runtime_force_suspend()
and and pm_runtime_force_resume() know that they should only invoke
runtime PM callbacks coming from the device's driver.
Namely, if this flag is set, pm_runtime_force_suspend() and
and pm_runtime_force_resume() will invoke runtime PM callbacks
provided by the device's driver directly with the assumption that
they have been called via a middle layer callback for device suspend
or resume, respectively.
For instance, acpi_general_pm_domain provides specific
callback functions for system suspend, acpi_subsys_suspend(),
acpi_subsys_suspend_late() and acpi_subsys_suspend_noirq(), and
it does not expect its runtime suspend callback function,
acpi_subsys_runtime_suspend(), to be invoked at any point during
system suspend. In particular, it does not expect that function
to be called from within any of the system suspend callback functions
mentioned above which would happen if a device driver collaborating
with acpi_general_pm_domain used pm_runtime_force_suspend() as its
callback function for any system suspend phase later than "prepare".
The new flag allows this expectation of acpi_general_pm_domain to
be formally expressed, which is going to be done subsequently.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/24017035.6Emhk5qWAg@rjwysocki.net
Add a special function for computing the address of the runtime PM
callback given by an offset relative to the start of the device
driver's struct dev_pm_ops and use it to obtain the driver callback
in __rpm_get_callback().
Also put the shared part of the callback address computation into a
separate helper function to avoid code duplication and explicit
pointer type casts.
The new __rpm_get_driver_callback() will be used subsequently for
implementing callback lookup in pm_runtime_force_suspend/resume().
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/2054356.usQuhbGJ8B@rjwysocki.net
Add a power.needs_force_resume check to pm_runtime_force_suspend() so
it need not rely on the runtime PM status of the device when deciding
whether or not to return early.
With the new check in place, pm_runtime_force_suspend() will also skip
devices with the runtime PM status equal to RPM_ACTIVE if they have
power.needs_force_resume set, so it won't need to change the RPM
status of the device to RPM_SUSPENDED in addition to setting
power.needs_force_resume in the case when pm_runtime_need_not_resume()
return false.
That allows the runtime PM status update to be removed from
pm_runtime_force_resume(), so the runtime PM status remains unchanged
between the pm_runtime_force_suspend() and pm_runtime_force_resume()
calls.
This change potentially unbreaks drivers that call pm_runtime_force_suspend()
from their ->remove() callbacks because currently, if the device being
unbound from its driver has a parent with enabled runtime PM and/or
(possibly) device links respecting runtime PM to suppliers, and it is
RPM_ACTIVE when the remove takes place, pm_runtime_force_suspend() will
not drop the parent's child count and the suppliers' runtime PM usage
counters after force-suspending the device unless pm_runtime_need_not_resume()
returns 'true' for it. Moreover, because pm_runtime_force_suspend()
changes the device's runtime PM status to RPM_SUSPENDED, in the above
case pm_runtime_reinit() will not cause those counters to drop, so they
will remain nonzero forever effectively preventing the devices in
question from runtime-suspending going forward.
This change is also needed for pm_runtime_force_suspend() to work
with PCI PM and ACPI PM after subsequent changes. Namely, say
DPM_FLAG_SMART_SUSPEND is set for a PCI device and its driver uses
pm_runtime_force_suspend() as its ->suspend() callback. If
pm_runtime_force_suspend() changed the runtime PM status of the
device to RPM_SUSPENDED, pci_pm_suspend_noirq() would skip the
device due to the dev_pm_skip_suspend() check.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/1855933.VLH7GnMWUR@rjwysocki.net
Clear power.needs_force_resume in pm_runtime_reinit() in case it has
been set by pm_runtime_force_suspend() invoked from a driver remove
callback.
Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/9495163.CDJkKcVGEf@rjwysocki.net
Curently, drivers using pm_runtime_force_suspend/resume() cannot set
DPM_FLAG_SMART_SUSPEND because the devices with that flag set may need
to be resumed during system-wide resume regardless of whether or not
they have power.needs_force_resume set. That can happen due to a
dependency resolved at the beginning of a system-wide resume transition
(for instance, a bus type or PM domain has decided to resume a
subordinate device with DPM_FLAG_SMART_SUSPEND and its parent and
suppliers also need to be resumed).
To overcome this limitation, modify pm_runtime_force_resume() to check
the device's power.smart_suspend flag (which is set for devices with
DPM_FLAG_SMART_SUSPEND set that meet some additional requirements) and
the device's runtime PM status in addition to power.needs_force_resume.
Also change it to clear power.smart_suspend to ensure that it will not
handle the same device twice during one transition.
The underlying observation is that there are two cases in which the
device needs to be resumed by pm_runtime_force_resume(). One of them
is when the device has power.needs_force_resume set, which means that
pm_runtime_force_suspend() has suspended it and decided that it should
be resumed during the subsequent system resume. The other one is when
power.smart_suspend is set and the device's runtume PM status is
RPM_ACTIVE.
Update kerneldoc comments in accordance with the code changes.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/3662906.iIbC2pHGDl@rjwysocki.net
Since pm_runtime_force_resume() and pm_runtime_need_not_resume() are only
needed for handling system-wide PM transitions, there is no reason to
compile them in if CONFIG_PM_SLEEP is unset.
Accordingly, move them under CONFIG_PM_SLEEP and make the static
inline stub for pm_runtime_force_resume() return an error to indicate
that it should not be used outside CONFIG_PM_SLEEP.
Putting pm_runtime_force_resume() also allows subsequent changes to
be more straightforward because this function is going to access a
device PM flag that is only defined when CONFIG_PM_SLEEP is set.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/3384523.aeNJFYEL58@rjwysocki.net
Since power.needs_force_resume is a bool field, use true/false
as its values instead of 1/0, respectively.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/2254988.irdbgypaU6@rjwysocki.net
Avoid starting "async" suspend processing upfront for devices that have
consumers and start "async" suspend processing for a device's suppliers
right after suspending the device itself.
Suggested-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://patch.msgid.link/3384525.44csPzL39Z@rjwysocki.net
Avoid starting "async" resume processing upfront for devices that have
suppliers and start "async" resume processing for a device's consumers
right after resuming the device itself.
Suggested-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://patch.msgid.link/3378088.aeNJFYEL58@rjwysocki.net
Drop superfluous might_sleep() calls from dpm_resume(), dpm_complete(),
and dpm_prepare(). These functions already invoke primitives that
implicitly check for sleep in atomic context:
- dpm_resume() and dpm_complete() invoke mutex_lock(), which internally
triggers might_sleep().
- dpm_prepare() calls wait_for_device_probe(), which internally uses
flush_work(), and thus might_sleep().
These annotations are unnecessary and can be dropped to reduce clutter.
Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
Link: https://patch.msgid.link/20250617084650.341262-1-quic_zhonhan@quicinc.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When __regmap_init() is called from __regmap_init_i2c() and
__regmap_init_spi() (and their devm versions), the bus argument
obtained from regmap_get_i2c_bus() and regmap_get_spi_bus(), may be
allocated using kmemdup() to support quirks. In those cases, the
bus->free_on_exit field is set to true.
However, inside __regmap_init(), buf is not freed on any error path.
This could lead to a memory leak of regmap_bus when __regmap_init()
fails. Fix that by freeing bus on error path when free_on_exit is set.
Fixes: ea030ca688 ("regmap-i2c: Set regmap max raw r/w from quirks")
Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com>
Link: https://patch.msgid.link/20250626172823.18725-1-abdun.nihaal@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Currently swap is restricted before drivers have had a chance to do
their prepare() PM callbacks. Restricting swap this early means that if
a driver needs to evict some content from memory into sawp in it's
prepare callback, it won't be able to.
On AMD dGPUs this can lead to failed suspends under memory pressure
situations as all VRAM must be evicted to system memory or swap.
Move the swap restriction to right after all devices have had a chance
to do the prepare() callback. If there is any problem with the sequence,
restore swap in the appropriate dpm resume callbacks or error handling
paths.
Closes: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Nat Wittstock <nat@fardog.io>
Tested-by: Lucian Langa <lucilanga@7pot.org>
Link: https://patch.msgid.link/20250613214413.4127087-1-superm1@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We need the driver-core fixes that are in 6.16-rc3 into here as well
to build on top of.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add the required features detection glue to bugs.c et all in order to
support the TSA mitigation.
Co-developed-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
To avoid coding mistakes like the one fixed by commit 3860cbe239 ("PM:
sleep: Fix bit masking operation"), introduce device_link_test() for
testing device link flags and use it where applicable.
No intentional functional impact.
Signed-off-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/2793309.mvXUDI8C0e@rjwysocki.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The acpi-einj conversion to faux_device_create() leads to a noisy error
message when the error injection facility is disabled. Quiet the error as
CXL error injection via ACPI expects the module to stay loaded even if the
error injection facility is disabled.
This situation arose because CXL knows proper kernel named objects to
trigger errors against, but acpi-einj knows how to perform the error
injection. The injection mechanism is shared with non-CXL use cases. The
result is CXL now has a module dependency on einj-core.ko, and init/probe
failures are handled at runtime.
Fixes: 6cb9441bfe ("ACPI: APEI: EINJ: Transition to the faux device interface")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20250607033228.1475625-3-dan.j.williams@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
faux_device_create() is almost a suitable candidate to replace
platform_driver_probe() if not for the fact that faux_device_create()
supports dynamic attach/detach of the driver.
Drop the bind attributes with the expectation that simple faux devices can
always assume that the device is permanently bound at create, and only
unbound at 'destroy'.
The acpi-einj driver depends on static bind.
Fixes: 6cb9441bfe ("ACPI: APEI: EINJ: Transition to the faux device interface")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20250607033228.1475625-2-dan.j.williams@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
Here is the big char/misc/iio and other small driver subsystem pull
request for 6.16-rc1.
Overall, a lot of individual changes, but nothing major, just the normal
constant forward progress of new device support and cleanups to existing
subsystems. Highlights in here are:
- Large IIO driver updates and additions and device tree changes
- Android binder bugfixes and logfile fixes
- mhi driver updates
- comedi driver updates
- counter driver updates and additions
- coresight driver updates and additions
- echo driver removal as there are no in-kernel users of it
- nvmem driver updates
- spmi driver updates
- new amd-sbi driver "subsystem" and drivers added
- rust miscdriver binding documentation fix
- other small driver fixes and updates (uio, w1, acrn, hpet, xillybus,
cardreader drivers, fastrpc and others.)
All of these have been in linux-next for quite a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaEKg5Q8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykUyACgmAzrzKMoQUwwhQ6ed2l7tHdrlOcAoIORI1/x
pNqQdrE1EbmAAyl47IN4
=ts6J
-----END PGP SIGNATURE-----
Merge tag 'char-misc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc / iio driver updates from Greg KH:
"Here is the big char/misc/iio and other small driver subsystem pull
request for 6.16-rc1.
Overall, a lot of individual changes, but nothing major, just the
normal constant forward progress of new device support and cleanups to
existing subsystems. Highlights in here are:
- Large IIO driver updates and additions and device tree changes
- Android binder bugfixes and logfile fixes
- mhi driver updates
- comedi driver updates
- counter driver updates and additions
- coresight driver updates and additions
- echo driver removal as there are no in-kernel users of it
- nvmem driver updates
- spmi driver updates
- new amd-sbi driver "subsystem" and drivers added
- rust miscdriver binding documentation fix
- other small driver fixes and updates (uio, w1, acrn, hpet,
xillybus, cardreader drivers, fastrpc and others)
All of these have been in linux-next for quite a while with no
reported problems"
* tag 'char-misc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (390 commits)
binder: fix yet another UAF in binder_devices
counter: microchip-tcb-capture: Add watch validation support
dt-bindings: iio: adc: Add ROHM BD79100G
iio: adc: add support for Nuvoton NCT7201
dt-bindings: iio: adc: add NCT7201 ADCs
iio: chemical: Add driver for SEN0322
dt-bindings: trivial-devices: Document SEN0322
iio: adc: ad7768-1: reorganize driver headers
iio: bmp280: zero-init buffer
iio: ssp_sensors: optimalize -> optimize
HID: sensor-hub: Fix typo and improve documentation
iio: admv1013: replace redundant ternary operator with just len
iio: chemical: mhz19b: Fix error code in probe()
iio: adc: at91-sama5d2: use IIO_DECLARE_BUFFER_WITH_TS
iio: accel: sca3300: use IIO_DECLARE_BUFFER_WITH_TS
iio: adc: ad7380: use IIO_DECLARE_DMA_BUFFER_WITH_TS
iio: adc: ad4695: rename AD4695_MAX_VIN_CHANNELS
iio: adc: ad4695: use IIO_DECLARE_DMA_BUFFER_WITH_TS
iio: introduce IIO_DECLARE_BUFFER_WITH_TS macros
iio: make IIO_DMA_MINALIGN minimum of 8 bytes
...
Fix three issues introduced into device suspend/resume error paths in
the PM core by some of the recent updates.
First off, replace list_splice() with list_splice_init() in three places
in device suspend error paths to avoid attempting to use an uninitialized
list head going forward.
Second, rearrange device_resume() to avoid leaking the power.is_suspended
device PM flag to the next system suspend/resume cycle where it can
confuse rolling back after an error or early wakeup.
Finally, add synchronization to dpm_async_resume_children() to avoid
resetting the async state mistakenly for devices whose resume callbacks
have already been queued up for asynchronous execution in the given
device resume phase, which fortunately can happen only if the preceding
system suspend transition has been aborted.
-----BEGIN PGP SIGNATURE-----
iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmhB5vASHHJqd0Byand5
c29ja2kubmV0AAoJEO5fvZ0v1OO1k7IH/RyTAYq/CyilGYCk7/Sd2NgOOrlkhapV
s6J3qi13aegcVX1sskJTlgR893Zh1GVIEbNH0DtQz8ExhPb+XIwCxWVIZTaDfogj
q40Y7DFn7XIhIH9KkmUab0/PKGdJjxz5EnLfJCFxYjztHC7W4Vx4hnaQyysFXa3t
r1hm2e2wWeVKzayFkMUblfXAHX2PSy/JoHZTy71yMDO7XpOui97gr8DsINdXgG6e
yCuTQX+3S0bwxnlRL6sHMvvaXMqZZzjVz50uJojP6bQ+7TdppYVIz/y6FVMuOmDd
sAoAWz0/rAukjXf3bzqPcHJLvDixBXokFkTfAyMb9a+SKoofTSvULkY=
=Wj0P
-----END PGP SIGNATURE-----
Merge tag 'pm-6.16-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Fix three issues introduced into device suspend/resume error paths in
the PM core by some of the recent updates.
First off, replace list_splice() with list_splice_init() in three
places in device suspend error paths to avoid attempting to use an
uninitialized list head going forward.
Second, rearrange device_resume() to avoid leaking the
power.is_suspended device PM flag to the next system suspend/resume
cycle where it can confuse rolling back after an error or early
wakeup.
Finally, add synchronization to dpm_async_resume_children() to avoid
resetting the async state mistakenly for devices whose resume
callbacks have already been queued up for asynchronous execution in
the given device resume phase, which fortunately can happen only if
the preceding system suspend transition has been aborted"
* tag 'pm-6.16-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: Add locking to dpm_async_resume_children()
PM: sleep: Fix power.is_suspended cleanup for direct-complete devices
PM: sleep: Fix list splicing in device suspend error paths
Commit 0cbef962ce ("PM: sleep: Resume children after resuming the
parent") introduced a subtle concurrency issue that may lead to a kernel
crash if system suspend is aborted and may also slow down asynchronous
device resume otherwise.
Namely, the initial list walks in dpm_noirq_resume_devices(),
dpm_resume_early(), and dpm_resume() call dpm_clear_async_state() for
every device and attempt to asynchronously resume it if it has no
children (so it is a "root" device). The asynchronous resume of a
root device triggers an attempt to asynchronously resume its children
which may take place before calling dpm_clear_async_state() for them
due to the lack of synchronization between dpm_async_resume_children()
and the code calling dpm_clear_async_state(). If this happens, the
dpm_clear_async_state() that comes in late, will clear
power.work_in_progress for the given device after it has been set by
__dpm_async(), so the suspend callback will be allowed to run once
again for the same device during the same transition. This leads to
a whole range of interesting breakage.
Fortunately, if the suspend transition is not aborted, power.work_in_progress
is set by it for all devices, so dpm_async_resume_children() will not
schedule asynchronous resume for them until dpm_clear_async_state()
clears that flag, but this means missing an opportunity to start the
resume of those devices earlier.
Address the above issue by adding dpm_list_mtx locking to
dpm_async_resume_children(), so it will wait for the entire initial
list walk and the invocation of dpm_clear_async_state() for all devices
to be completed before scheduling any new asynchronous resume callbacks.
Fixes: 0cbef962ce ("PM: sleep: Resume children after resuming the parent")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4280
Reported-and-tested-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/13779172.uLZWGnKmhe@rjwysocki.net
Commit 03f1444016 ("PM: sleep: Fix handling devices with direct_complete
set on errors") caused power.is_suspended to be set for devices with
power.direct_complete set, but it forgot to ensure the clearing of that
flag for them in device_resume(), so power.is_suspended is still set for
them during the next system suspend-resume cycle.
If that cycle is aborted in dpm_suspend(), the subsequent invocation of
dpm_resume() will trigger a device_resume() call for every device and
because power.is_suspended is set for the devices in question, they will
not be skipped by device_resume() as expected which causes scary error
messages to be logged (as appropriate).
To address this issue, move the clearing of power.is_suspended in
device_resume() immediately after the power.is_suspended check so it
will be always cleared for all devices processed by that function.
Fixes: 03f1444016 ("PM: sleep: Fix handling devices with direct_complete set on errors")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4280
Reported-and-tested-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/4990586.GXAFRqVoOG@rjwysocki.net
Commits aa7a9275ab ("PM: sleep: Suspend async parents after suspending
children") and 443046d1ad ("PM: sleep: Make suspend of devices more
asynchronous") added list splicing to the error paths of dpm_suspend(),
dpm_suspend_late(), and dpm_noirq_suspend_devices(), but they should
have used the list_splice_init() variant because the emptied list is
used going forward in all of these cases.
Replace list_splice() with list_splice_init() in the code in question as
appropriate.
Fixes: aa7a9275ab ("PM: sleep: Suspend async parents after suspending children")
Fixes: 443046d1ad ("PM: sleep: Make suspend of devices more asynchronous")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4280
Reported-and-tested-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/4659282.LvFx2qVVIh@rjwysocki.net
* LED Triggers:
* Allow writing "default" to the sysfs 'trigger' attribute to set an LED to its default trigger
* If the default trigger is "none", writing "default" will remove the current trigger
* Updated sysfs ABI documentation for the new "default" trigger functionality
* LED KUnit Testing:
* Provide a skeleton KUnit test suite for the LEDs framework
* Expand the LED class device registration KUnit test to cover more scenarios, including
`brightness_get` behavior
* Add KUnit tests for the LED lookup and get API (`led_add_lookup`, `devm_led_get`)
* LED Flash Class:
* Add support for setting flash/strobe duration through a new `duration_set` op and
`led_set_flash_duration()` function, aligning with `V4L2_CID_FLASH_DURATION`
* Texas Instruments TPS6131x:
* Add a new driver for the TPS61310/TPS61311 flash LED controllers
* The driver supports the device's three constant-current sinks for flash and torch modes
* LED Core:
* Prevent potential `snprintf()` truncations in LED names by checking for buffer overflows
* ChromeOS EC LEDs:
* Avoid a -Wflex-array-member-not-at-end GCC warning by replacing an on-stack flexible structure
definition with a utility function call
* Multicolor LEDs:
* Fix issue where setting multi_intensity while software blinking is active could stop blinking
* PCA955x LEDs:
* Avoid potential buffer overflow when creating default labels by changing a field's type to
`u8` and updating format specifiers
* PCA995x LEDs:
* Fix a typo (stray space) in an `of_device_id` entry in the `pca995x_of_match` table
* Kconfig:
* Prevent LED drivers from being enabled by default when `COMPILE_TEST` is set
* Device Property API:
* Split `device_get_child_node_count()` into a new helper `fwnode_get_child_node_count()` that
doesn't require a device struct, making the API more symmetrical
* Driver Modernization (using `fwnode_get_child_node_count()`):
* Update `leds-pwm-multicolor`, `leds-ncp5623` and `leds-ncp5623` to use the new
`fwnode_get_child_node_count()` helper, removing their custom implementation
* As above in the USB Type-C TCPM driver
* Driver Modernization (using new GPIO setter callbacks):
* Convert `leds-lgm-sso` to use new GPIO line value setter callbacks which return an integer
for error handling
* Convert `leds-pca955x`, `leds-pca9532` and `leds-tca6507` to use new GPIO setter callbacks
* Documentation:
* Remove the `.rst` extension for `leds-st1202` in the documentation index for consistency
* LP8860 LEDs:
* Use `regmap_multi_reg_write()` for EEPROM writes instead of manual looping
* Use scoped mutex guards and `devm_mutex_init()` to simplify function exits and ensure
automatic cleanup
* Remove default register definitions that are unused when regmap caching is not active
* Use `devm_regulator_get_enable_optional()` to handle the optional regulator, simplifying
enabling and removing manual disabling
* Refactor `lp8860_unlock_eeprom()` to only perform the unlock operation, removing the lock
part and an unnecessary parameter
* Use a `devm` action to disable the enable-GPIO, simplifying cleanup and error paths, and
remove the now-empty `.remove()` function
* Turris Omnia LEDs:
* Drop unnecessary commas in terminator entries of `struct attribute` and
`struct of_device_id` arrays
* MT6370 RGB LEDs:
* Use the `LINEAR_RANGE()` for defining `struct linear_range` entries to improve robustness
* Texas Instruments TPS6131x:
* Add new devicetree bindings for the TI TPS61310/TPS61311 flash LED driver
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAmg+xocACgkQUa+KL4f8
d2HbmA/+KfKGT6C3q7TdWgCrrEvYk0CIfJ7g3cTjcQVjCg2uojHVgvGEgX6i2M5e
bYbLKfoh1JhSjGErPadE3VExSCFAkNhHIH5k81sEPbzDTI/PscLdWNuv0T/ZhZn+
xHcl721Du9w/7TK+IN1VBiz5oTfvPbca+2hLGmyJAWgGetNJdHmZlJkJpSmWEeaj
SbNQ3c4zy0LQyMfWISUdOH4nmL2E2D5dsIOxGEER7E0djLy5ZPomVwdgotTys6/4
1xFAnZTw6MlnO3tCQ5R0SCRadc7NRp6XFeyGCk0JyKTju/OXLoB0QVJSORZ5WBsQ
0z6ILnt8+2xG5GS69zoSk3J5eSNsswvxO+v6GOTSvL3+Cf2UXZpsIMYFoGq2ol0p
qjHFJnenOpCbUl84vCRvG87vtHi9cdpgyatejU/vSg9Mtokrjuh/yz73HIrr/+Pr
rwq/PZ5FkCj/pIWyTYUMivS/qfC0qs71Btee9eGNxmQgzKli0E+KtduqQtE733VA
BH6lYhAFGALx+KLZLuVJdJ5EVgM4Ui+mWHX9ztPIVJ1NDCq0SahqGK6JskPvi+L5
fgB2jImDggjcDSYcVyM4cu4p73nN3Mbadu7ZvAEuFSzrJS1IlHpYvyYGhpuaw6Nn
r6dscjnKSwFT6ANbVKB21x7YqZmKNgnD3IhRskmTfSPAGu4kEoA=
=iVDb
-----END PGP SIGNATURE-----
Merge tag 'leds-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
"LED Triggers:
- Allow writing "default" to the sysfs 'trigger' attribute to set an
LED to its default trigger
- If the default trigger is "none", writing "default" will remove the
current trigger
- Updated sysfs ABI documentation for the new "default" trigger
functionality
LED KUnit Testing:
- Provide a skeleton KUnit test suite for the LEDs framework
- Expand the LED class device registration KUnit test to cover more
scenarios, including 'brightness_get' behavior
- Add KUnit tests for the LED lookup and get API ('led_add_lookup',
'devm_led_get')
LED Flash Class:
- Add support for setting flash/strobe duration through a new
'duration_set' op and 'led_set_flash_duration()' function, aligning
with 'V4L2_CID_FLASH_DURATION'
Texas Instruments TPS6131x:
- Add a new driver for the TPS61310/TPS61311 flash LED controllers
- The driver supports the device's three constant-current sinks for
flash and torch modes
LED Core:
- Prevent potential 'snprintf()' truncations in LED names by checking
for buffer overflows
ChromeOS EC LEDs:
- Avoid a -Wflex-array-member-not-at-end GCC warning by replacing an
on-stack flexible structure definition with a utility function call
Multicolor LEDs:
- Fix issue where setting multi_intensity while software blinking is
active could stop blinking
PCA955x LEDs:
- Avoid potential buffer overflow when creating default labels by
changing a field's type to 'u8' and updating format specifiers
PCA995x LEDs:
- Fix a typo (stray space) in an 'of_device_id' entry in the
'pca995x_of_match' table
Kconfig:
- Prevent LED drivers from being enabled by default when
'COMPILE_TEST' is set
Device Property API:
- Split 'device_get_child_node_count()' into a new helper
'fwnode_get_child_node_count()' that doesn't require a device
struct, making the API more symmetrical
Driver Modernization (using 'fwnode_get_child_node_count()'):
- Update 'leds-pwm-multicolor', 'leds-ncp5623' and 'leds-ncp5623' to
use the new 'fwnode_get_child_node_count()' helper, removing their
custom implementation
- As above in the USB Type-C TCPM driver
Driver Modernization (using new GPIO setter callbacks):
- Convert 'leds-lgm-sso' to use new GPIO line value setter callbacks
which return an integer for error handling
- Convert 'leds-pca955x', 'leds-pca9532' and 'leds-tca6507' to use
new GPIO setter callbacks
Documentation:
- Remove the '.rst' extension for 'leds-st1202' in the documentation
index for consistency
LP8860 LEDs:
- Use 'regmap_multi_reg_write()' for EEPROM writes instead of manual
looping
- Use scoped mutex guards and 'devm_mutex_init()' to simplify
function exits and ensure automatic cleanup
- Remove default register definitions that are unused when regmap
caching is not active
- Use 'devm_regulator_get_enable_optional()' to handle the optional
regulator, simplifying enabling and removing manual disabling
- Refactor 'lp8860_unlock_eeprom()' to only perform the unlock
operation, removing the lock part and an unnecessary parameter
- Use a 'devm' action to disable the enable-GPIO, simplifying cleanup
and error paths, and remove the now-empty '.remove()' function
Turris Omnia LEDs:
- Drop unnecessary commas in terminator entries of 'struct attribute'
and 'struct of_device_id' arrays
MT6370 RGB LEDs:
- Use the 'LINEAR_RANGE()' for defining 'struct linear_range' entries
to improve robustness
Texas Instruments TPS6131x:
- Add new devicetree bindings for the TI TPS61310/TPS61311 flash LED
driver"
* tag 'leds-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (31 commits)
leds: tps6131x: Add support for Texas Instruments TPS6131X flash LED driver
dt-bindings: leds: Add Texas Instruments TPS6131x flash LED driver
leds: flash: Add support for flash/strobe duration
leds: rgb: leds-mt6370-rgb: Improve definition of some struct linear_range
leds: led-test: Provide tests for the lookup and get infrastructure
leds: led-test: Fill out the registration test to cover more test cases
leds: led-test: Remove standard error checking after KUNIT_ASSERT_*()
leds: pca995x: Fix typo in pca995x_of_match's of_device_id entry
leds: Provide skeleton KUnit testing for the LEDs framework
leds: tca6507: Use new GPIO line value setter callbacks
leds: pca9532: Use new GPIO line value setter callbacks
leds: pca955x: Use new GPIO line value setter callbacks
leds: lgm-sso: Use new GPIO line value setter callbacks
leds: Do not enable by default during compile testing
leds: turris-omnia: Drop commas in the terminator entries
leds: lp8860: Disable GPIO with devm action
leds: lp8860: Only unlock in lp8860_unlock_eeprom()
leds: lp8860: Enable regulator using enable_optional helper
leds: lp8860: Remove default regs when not caching
leds: lp8860: Use new mutex guards to cleanup function exits
...
simplifies the act of creating a pte which addresses the first page in a
folio and reduces the amount of plumbing which architecture must
implement to provide this.
- The 8 patch series "Misc folio patches for 6.16" from Matthew Wilcox
is a shower of largely unrelated folio infrastructure changes which
clean things up and better prepare us for future work.
- The 3 patch series "memory,x86,acpi: hotplug memory alignment
advisement" from Gregory Price adds early-init code to prevent x86 from
leaving physical memory unused when physical address regions are not
aligned to memory block size.
- The 2 patch series "mm/compaction: allow more aggressive proactive
compaction" from Michal Clapinski provides some tuning of the (sadly,
hard-coded (more sadly, not auto-tuned)) thresholds for our invokation
of proactive compaction. In a simple test case, the reduction of a guest
VM's memory consumption was dramatic.
- The 8 patch series "Minor cleanups and improvements to swap freeing
code" from Kemeng Shi provides some code cleaups and a small efficiency
improvement to this part of our swap handling code.
- The 6 patch series "ptrace: introduce PTRACE_SET_SYSCALL_INFO API"
from Dmitry Levin adds the ability for a ptracer to modify syscalls
arguments. At this time we can alter only "system call information that
are used by strace system call tampering, namely, syscall number,
syscall arguments, and syscall return value.
This series should have been incorporated into mm.git's "non-MM"
branch, but I goofed.
- The 3 patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report
guard regions" from Andrei Vagin extends the info returned by the
PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more
efficiently get at the info about guard regions.
- The 2 patch series "Fix parameter passed to page_mapcount_is_type()"
from Gavin Shan implements that fix. No runtime effect is expected
because validate_page_before_insert() happens to fix up this error.
- The 3 patch series "kernel/events/uprobes: uprobe_write_opcode()
rewrite" from David Hildenbrand basically brings uprobe text poking into
the current decade. Remove a bunch of hand-rolled implementation in
favor of using more current facilities.
- The 3 patch series "mm/ptdump: Drop assumption that pxd_val() is u64"
from Anshuman Khandual provides enhancements and generalizations to the
pte dumping code. This might be needed when 128-bit Page Table
Descriptors are enabled for ARM.
- The 12 patch series "Always call constructor for kernel page tables"
from Kevin Brodsky "ensures that the ctor/dtor is always called for
kernel pgtables, as it already is for user pgtables". This permits the
addition of more functionality such as "insert hooks to protect page
tables". This change does result in various architectures performing
unnecesary work, but this is fixed up where it is anticipated to occur.
- The 9 patch series "Rust support for mm_struct, vm_area_struct, and
mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM
structures.
- The 3 patch series "fix incorrectly disallowed anonymous VMA merges"
from Lorenzo Stoakes takes advantage of some VMA merging opportunities
which we've been missing for 15 years.
- The 4 patch series "mm/madvise: batch tlb flushes for MADV_DONTNEED
and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB
flushing. Instead of flushing each address range in the provided iovec,
we batch the flushing across all the iovec entries. The syscall's cost
was approximately halved with a microbenchmark which was designed to
load this particular operation.
- The 6 patch series "Track node vacancy to reduce worst case allocation
counts" from Sidhartha Kumar makes the maple tree smarter about its node
preallocation. stress-ng mmap performance increased by single-digit
percentages and the amount of unnecessarily preallocated memory was
dramaticelly reduced.
- The 3 patch series "mm/gup: Minor fix, cleanup and improvements" from
Baoquan He removes a few unnecessary things which Baoquan noted when
reading the code.
- The 3 patch series ""Enhance sysfs handling for memory hotplug in
weighted interleave" from Rakie Kim "enhances the weighted interleave
policy in the memory management subsystem by improving sysfs handling,
fixing memory leaks, and introducing dynamic sysfs updates for memory
hotplug support". Fixes things on error paths which we are unlikely to
hit.
- The 7 patch series "mm/damon: auto-tune DAMOS for NUMA setups
including tiered memory" from SeongJae Park introduces new DAMOS quota
goal metrics which eliminate the manual tuning which is required when
utilizing DAMON for memory tiering.
- The 5 patch series "mm/vmalloc.c: code cleanup and improvements" from
Baoquan He provides cleanups and small efficiency improvements which
Baoquan found via code inspection.
- The 2 patch series "vmscan: enforce mems_effective during demotion"
from Gregory Price "changes reclaim to respect cpuset.mems_effective
during demotion when possible". because "presently, reclaim explicitly
ignores cpuset.mems_effective when demoting, which may cause the cpuset
settings to violated." "This is useful for isolating workloads on a
multi-tenant system from certain classes of memory more consistently."
- The 2 patch series ""Clean up split_huge_pmd_locked() and remove
unnecessary folio pointers" from Gavin Guo provides minor cleanups and
efficiency gains in in the huge page splitting and migrating code.
- The 3 patch series "Use kmem_cache for memcg alloc" from Huan Yang
creates a slab cache for `struct mem_cgroup', yielding improved memory
utilization.
- The 4 patch series "add max arg to swappiness in memory.reclaim and
lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness="
argument for memory.reclaim MGLRU's lru_gen. This directs proactive
reclaim to reclaim from only anon folios rather than file-backed folios.
- The 17 patch series "kexec: introduce Kexec HandOver (KHO)" from Mike
Rapoport is the first step on the path to permitting the kernel to
maintain existing VMs while replacing the host kernel via file-based
kexec. At this time only memblock's reserve_mem is preserved.
- The 7 patch series "mm: Introduce for_each_valid_pfn()" from David
Woodhouse provides and uses a smarter way of looping over a pfn range.
By skipping ranges of invalid pfns.
- The 2 patch series "sched/numa: Skip VMA scanning on memory pinned to
one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless
VMA scanning when a task is pinned a single NUMA mode. Dramatic
performance benefits were seen in some real world cases.
- The 2 patch series "JFS: Implement migrate_folio for
jfs_metapage_aops" from Shivank Garg addresses a warning which occurs
during memory compaction when using JFS.
- The 4 patch series "move all VMA allocation, freeing and duplication
logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c
into the more appropriate mm/vma.c.
- The 6 patch series "mm, swap: clean up swap cache mapping helper" from
Kairui Song provides code consolidation and cleanups related to the
folio_index() function.
- The 2 patch series "mm/gup: Cleanup memfd_pin_folios()" from Vishal
Moola does that.
- The 8 patch series "memcg: Fix test_memcg_min/low test failures" from
Waiman Long addresses some bogus failures which are being reported by
the test_memcontrol selftest.
- The 3 patch series "eliminate mmap() retry merge, add .mmap_prepare
hook" from Lorenzo Stoakes commences the deprecation of
file_operations.mmap() in favor of the new
file_operations.mmap_prepare(). The latter is more restrictive and
prevents drivers from messing with things in ways which, amongst other
problems, may defeat VMA merging.
- The 4 patch series "memcg: decouple memcg and objcg stocks"" from
Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's
one. This is a step along the way to making memcg and objcg charging
NMI-safe, which is a BPF requirement.
- The 6 patch series "mm/damon: minor fixups and improvements for code,
tests, and documents" from SeongJae Park is "yet another batch of
miscellaneous DAMON changes. Fix and improve minor problems in code,
tests and documents."
- The 7 patch series "memcg: make memcg stats irq safe" from Shakeel
Butt converts memcg stats to be irq safe. Another step along the way to
making memcg charging and stats updates NMI-safe, a BPF requirement.
- The 4 patch series "Let unmap_hugepage_range() and several related
functions take folio instead of page" from Fan Ni provides folio
conversions in the hugetlb code.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDt5qgAKCRDdBJ7gKXxA
ju6XAP9nTiSfRz8Cz1n5LJZpFKEGzLpSihCYyR6P3o1L9oe3mwEAlZ5+XAwk2I5x
Qqb/UGMEpilyre1PayQqOnct3aSL9Ao=
=tYYm
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of
creating a pte which addresses the first page in a folio and reduces
the amount of plumbing which architecture must implement to provide
this.
- "Misc folio patches for 6.16" from Matthew Wilcox is a shower of
largely unrelated folio infrastructure changes which clean things up
and better prepare us for future work.
- "memory,x86,acpi: hotplug memory alignment advisement" from Gregory
Price adds early-init code to prevent x86 from leaving physical
memory unused when physical address regions are not aligned to memory
block size.
- "mm/compaction: allow more aggressive proactive compaction" from
Michal Clapinski provides some tuning of the (sadly, hard-coded (more
sadly, not auto-tuned)) thresholds for our invokation of proactive
compaction. In a simple test case, the reduction of a guest VM's
memory consumption was dramatic.
- "Minor cleanups and improvements to swap freeing code" from Kemeng
Shi provides some code cleaups and a small efficiency improvement to
this part of our swap handling code.
- "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin
adds the ability for a ptracer to modify syscalls arguments. At this
time we can alter only "system call information that are used by
strace system call tampering, namely, syscall number, syscall
arguments, and syscall return value.
This series should have been incorporated into mm.git's "non-MM"
branch, but I goofed.
- "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from
Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl
against /proc/pid/pagemap. This permits CRIU to more efficiently get
at the info about guard regions.
- "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan
implements that fix. No runtime effect is expected because
validate_page_before_insert() happens to fix up this error.
- "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David
Hildenbrand basically brings uprobe text poking into the current
decade. Remove a bunch of hand-rolled implementation in favor of
using more current facilities.
- "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman
Khandual provides enhancements and generalizations to the pte dumping
code. This might be needed when 128-bit Page Table Descriptors are
enabled for ARM.
- "Always call constructor for kernel page tables" from Kevin Brodsky
ensures that the ctor/dtor is always called for kernel pgtables, as
it already is for user pgtables.
This permits the addition of more functionality such as "insert hooks
to protect page tables". This change does result in various
architectures performing unnecesary work, but this is fixed up where
it is anticipated to occur.
- "Rust support for mm_struct, vm_area_struct, and mmap" from Alice
Ryhl adds plumbing to permit Rust access to core MM structures.
- "fix incorrectly disallowed anonymous VMA merges" from Lorenzo
Stoakes takes advantage of some VMA merging opportunities which we've
been missing for 15 years.
- "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from
SeongJae Park optimizes process_madvise()'s TLB flushing.
Instead of flushing each address range in the provided iovec, we
batch the flushing across all the iovec entries. The syscall's cost
was approximately halved with a microbenchmark which was designed to
load this particular operation.
- "Track node vacancy to reduce worst case allocation counts" from
Sidhartha Kumar makes the maple tree smarter about its node
preallocation.
stress-ng mmap performance increased by single-digit percentages and
the amount of unnecessarily preallocated memory was dramaticelly
reduced.
- "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes
a few unnecessary things which Baoquan noted when reading the code.
- ""Enhance sysfs handling for memory hotplug in weighted interleave"
from Rakie Kim "enhances the weighted interleave policy in the memory
management subsystem by improving sysfs handling, fixing memory
leaks, and introducing dynamic sysfs updates for memory hotplug
support". Fixes things on error paths which we are unlikely to hit.
- "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory"
from SeongJae Park introduces new DAMOS quota goal metrics which
eliminate the manual tuning which is required when utilizing DAMON
for memory tiering.
- "mm/vmalloc.c: code cleanup and improvements" from Baoquan He
provides cleanups and small efficiency improvements which Baoquan
found via code inspection.
- "vmscan: enforce mems_effective during demotion" from Gregory Price
changes reclaim to respect cpuset.mems_effective during demotion when
possible. because presently, reclaim explicitly ignores
cpuset.mems_effective when demoting, which may cause the cpuset
settings to violated.
This is useful for isolating workloads on a multi-tenant system from
certain classes of memory more consistently.
- "Clean up split_huge_pmd_locked() and remove unnecessary folio
pointers" from Gavin Guo provides minor cleanups and efficiency gains
in in the huge page splitting and migrating code.
- "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache
for `struct mem_cgroup', yielding improved memory utilization.
- "add max arg to swappiness in memory.reclaim and lru_gen" from
Zhongkun He adds a new "max" argument to the "swappiness=" argument
for memory.reclaim MGLRU's lru_gen.
This directs proactive reclaim to reclaim from only anon folios
rather than file-backed folios.
- "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the
first step on the path to permitting the kernel to maintain existing
VMs while replacing the host kernel via file-based kexec. At this
time only memblock's reserve_mem is preserved.
- "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides
and uses a smarter way of looping over a pfn range. By skipping
ranges of invalid pfns.
- "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning
when a task is pinned a single NUMA mode.
Dramatic performance benefits were seen in some real world cases.
- "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank
Garg addresses a warning which occurs during memory compaction when
using JFS.
- "move all VMA allocation, freeing and duplication logic to mm" from
Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more
appropriate mm/vma.c.
- "mm, swap: clean up swap cache mapping helper" from Kairui Song
provides code consolidation and cleanups related to the folio_index()
function.
- "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that.
- "memcg: Fix test_memcg_min/low test failures" from Waiman Long
addresses some bogus failures which are being reported by the
test_memcontrol selftest.
- "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo
Stoakes commences the deprecation of file_operations.mmap() in favor
of the new file_operations.mmap_prepare().
The latter is more restrictive and prevents drivers from messing with
things in ways which, amongst other problems, may defeat VMA merging.
- "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples
the per-cpu memcg charge cache from the objcg's one.
This is a step along the way to making memcg and objcg charging
NMI-safe, which is a BPF requirement.
- "mm/damon: minor fixups and improvements for code, tests, and
documents" from SeongJae Park is yet another batch of miscellaneous
DAMON changes. Fix and improve minor problems in code, tests and
documents.
- "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg
stats to be irq safe. Another step along the way to making memcg
charging and stats updates NMI-safe, a BPF requirement.
- "Let unmap_hugepage_range() and several related functions take folio
instead of page" from Fan Ni provides folio conversions in the
hugetlb code.
* tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits)
mm: pcp: increase pcp->free_count threshold to trigger free_high
mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
mm/hugetlb: pass folio instead of page to unmap_ref_private()
memcg: objcg stock trylock without irq disabling
memcg: no stock lock for cpu hot-unplug
memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
memcg: make count_memcg_events re-entrant safe against irqs
memcg: make mod_memcg_state re-entrant safe against irqs
memcg: move preempt disable to callers of memcg_rstat_updated
memcg: memcg_rstat_updated re-entrant safe against irqs
mm: khugepaged: decouple SHMEM and file folios' collapse
selftests/eventfd: correct test name and improve messages
alloc_tag: check mem_profiling_support in alloc_tag_init
Docs/damon: update titles and brief introductions to explain DAMOS
selftests/damon/_damon_sysfs: read tried regions directories in order
mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
...
Here are the driver core / kernfs changes for 6.16-rc1.
Not a huge number of changes this development cycle, here's the summary
of what is included in here:
- kernfs locking tweaks, pushing some global locks down into a per-fs
image lock
- rust driver core and pci device bindings added for new features.
- sysfs const work for bin_attributes. This churn should now be
completed for those types of attributes
- auxbus device creation helpers added
- fauxbus fix for creating sysfs files after the probe completed
properly
- other tiny updates for driver core things.
All of these have been in linux-next for over a week with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaDbe+g8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylYbACgl/MngU9pRnx5jZIQh6bWveFSeo8AnRE4U5x0
X+lgTPjGKL1RrV3C5HJp
=+0BA
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Greg KH:
"Here are the driver core / kernfs changes for 6.16-rc1.
Not a huge number of changes this development cycle, here's the
summary of what is included in here:
- kernfs locking tweaks, pushing some global locks down into a per-fs
image lock
- rust driver core and pci device bindings added for new features.
- sysfs const work for bin_attributes.
The final churn of switching away from and removing the
transitional struct members, "read_new", "write_new" and
"bin_attrs_new" will come after the merge window to avoid
unnecesary merge conflicts.
- auxbus device creation helpers added
- fauxbus fix for creating sysfs files after the probe completed
properly
- other tiny updates for driver core things.
All of these have been in linux-next for over a week with no reported
issues"
* tag 'driver-core-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
kernfs: Relax constraint in draining guard
Documentation: embargoed-hardware-issues.rst: Remove myself
drivers: hv: fix up const issue with vmbus_chan_bin_attrs
firmware_loader: use SHA-256 library API instead of crypto_shash API
docs: debugfs: do not recommend debugfs_remove_recursive
PM: wakeup: Do not expose 4 device wakeup source APIs
kernfs: switch global kernfs_rename_lock to per-fs lock
kernfs: switch global kernfs_idr_lock to per-fs lock
driver core: auxiliary bus: Fix IS_ERR() vs NULL mixup in __devm_auxiliary_device_create()
sysfs: constify attribute_group::bin_attrs
sysfs: constify bin_attribute argument of bin_attribute::read/write()
software node: Correct a OOB check in software_node_get_reference_args()
devres: simplify devm_kstrdup() using devm_kmemdup()
platform: replace magic number with macro PLATFORM_DEVID_NONE
component: do not try to unbind unbound components
driver core: auxiliary bus: add device creation helpers
driver core: faux: Add sysfs groups after probing
- Fix potential division-by-zero error in em_compute_costs() (Yaxiong
Tian).
- Fix typos in energy model documentation and example driver code (Moon
Hee Lee, Atul Kumar Pant).
- Rearrange the energy model management code and add a new function for
adjusting a CPU energy model after adjusting the capacity of the
given CPU to it (Rafael Wysocki).
- Refactor cpufreq_online(), add and use cpufreq policy locking guards,
use __free() in policy reference counting, and clean up core cpufreq
code on top of that (Rafael Wysocki).
- Fix boost handling on CPU suspend/resume and sysfs updates (Viresh
Kumar).
- Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay
Ugwekar).
- Add offline, online and suspend callbacks to the amd-pstate driver,
rename and use the existing amd_pstate_epp callbacks in it (Dhananjay
Ugwekar).
- Add support for the "Requested CPU Min frequency" BIOS option to the
amd-pstate driver (Dhananjay Ugwekar).
- Reset amd-pstate driver mode after running selftests (Swapnil
Sapkal).
- Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan
Chancellor).
- Add helper for governor checks to the schedutil cpufreq governor and
move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki).
- Populate the cpu_capacity sysfs entries from the intel_pstate driver
after registering asym capacity support (Ricardo Neri).
- Add support for enabling Energy-aware scheduling (EAS) to the
intel_pstate driver when operating in the passive mode on a hybrid
platform (Rafael Wysocki).
- Drop redundant cpus_read_lock() from store_local_boost() in the
cpufreq core (Seyediman Seyedarab).
- Replace sscanf() with kstrtouint() in the cpufreq code and use a
symbol instead of a raw number in it (Bowen Yu).
- Add support for autonomous CPU performance state selection to the
CPPC cpufreq driver (Lifeng Zheng).
- OPP: Add dev_pm_opp_set_level() (Praveen Talari).
- Introduce scope-based cleanup headers and mutex locking guards in OPP
core (Viresh Kumar).
- Switch OPP to use kmemdup_array() (Zhang Enpei).
- Optimize bucket assignment when next_timer_ns equals KTIME_MAX in the
menu cpuidle governor (Zhongqiu Han).
- Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla).
- Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem
Bityutskiy).
- Fix typos in two comments in the teo cpuidle governor (Atul Kumar
Pant).
- Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
Kalla).
- Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki).
- Add new devm_ functions for enabling runtime PM and runtime PM
reference counting (Bence Csókás).
- Remove size arguments from strscpy() calls in the hibernation core
code (Thorsten Blum).
- Adjust the handling of devices with asynchronous suspend enabled
during system suspend and resume to start resuming them immediately
after resuming their parents and to start suspending such a device
immediately after suspending its first child (Rafael Wysocki).
- Adjust messages printed during tasks freezing to avoid using
pr_cont() (Andrew Sayers, Paul Menzel).
- Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
Zhang).
- Add missing wakeup source attribute relax_count to sysfs and
remove the space character at the end ofi the string produced by
pm_show_wakelocks() (Zijun Hu).
- Add configurable pm_test delay for hibernation (Zihuan Zhang).
- Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
cypd4226 device on Tegra boards from suspending prematurely (Jon
Hunter).
- Unbreak printing PM debug messages during hibernation and clean up
some related code (Rafael Wysocki).
- Add a systemd service to run cpupower and change cpupower binding's
Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli).
-----BEGIN PGP SIGNATURE-----
iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmg0xS0SHHJqd0Byand5
c29ja2kubmV0AAoJEO5fvZ0v1OO1AwwH/Rvgza5YBPb9JZqWJT/ZiBw7HcEWHhP1
fNfcVU1gXPZiF0yoPfjfJua6BcLj6lyQ3d/+zWqqAcWfmRSD6HPe8yYz8qALUAqj
RWhDa04aGj6B9bQuOjejatznYlQlkwCRT7zec+75D+dAHVMqR/Vt2LFAetCadgHe
MQibAQmVFXu3RFkBjReTAdGzVoTXkwoZDrzdfA2aFAfMJNtJpOW4atUZvnucuctv
VK3ZratrctCIw7yXEoB1nWSmlY7R5JlslplBfndjmmOnky3YxNr7C6paqwtbTWoF
MiX48qkmLOGeO6gS8s/lVCDQ4oZ+UNFQvXRsM5NGjycBikhHX/dp/w4=
=dIqJ
-----END PGP SIGNATURE-----
Merge tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"Once again, the changes are dominated by cpufreq updates, but this
time the majority of them are cpufreq core changes, mostly related to
the introduction of policy locking guards and __free() usage, and
fixes related to boost handling.
Still, there is also a significant update of the intel_pstate driver
making it register an energy model when running on a hybrid platform
which is used for enabling energy-aware scheduling (EAS) if the driver
operates in the passive mode (and schedutil is used as the cpufreq
governor for all CPUs which is the passive mode default).
There are some amd-pstate driver updates too, for a good measure,
including the "Requested CPU Min frequency" BIOS option support and
new online/offline callbacks.
In the cpuidle space, the most significant change is the addition of a
C1 demotion on/off sysfs knob to intel_idle which should help some
users to configure their systems more precisely. There is also the
conversion of the PSCI cpuidle driver to a faux device one and there
are two small updates of cpuidle governors.
Device power management is also modified quite a bit, especially the
handling of devices with asynchronous suspend and resume enabled
during system transitions. They are now going to be handled more
asynchronously during suspend transitions and somewhat less
aggressively during resume transitions.
Apart from the above, the operating performance points (OPP) library
is now going to use mutex locking guards and scope-based cleanup
helpers and there is the usual bunch of assorted fixes and code
cleanups.
Specifics:
- Fix potential division-by-zero error in em_compute_costs() (Yaxiong
Tian)
- Fix typos in energy model documentation and example driver code
(Moon Hee Lee, Atul Kumar Pant)
- Rearrange the energy model management code and add a new function
for adjusting a CPU energy model after adjusting the capacity of
the given CPU to it (Rafael Wysocki)
- Refactor cpufreq_online(), add and use cpufreq policy locking
guards, use __free() in policy reference counting, and clean up
core cpufreq code on top of that (Rafael Wysocki)
- Fix boost handling on CPU suspend/resume and sysfs updates (Viresh
Kumar)
- Fix des_perf clamping with max_perf in amd_pstate_update()
(Dhananjay Ugwekar)
- Add offline, online and suspend callbacks to the amd-pstate driver,
rename and use the existing amd_pstate_epp callbacks in it
(Dhananjay Ugwekar)
- Add support for the "Requested CPU Min frequency" BIOS option to
the amd-pstate driver (Dhananjay Ugwekar)
- Reset amd-pstate driver mode after running selftests (Swapnil
Sapkal)
- Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan
Chancellor)
- Add helper for governor checks to the schedutil cpufreq governor
and move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki)
- Populate the cpu_capacity sysfs entries from the intel_pstate
driver after registering asym capacity support (Ricardo Neri)
- Add support for enabling Energy-aware scheduling (EAS) to the
intel_pstate driver when operating in the passive mode on a hybrid
platform (Rafael Wysocki)
- Drop redundant cpus_read_lock() from store_local_boost() in the
cpufreq core (Seyediman Seyedarab)
- Replace sscanf() with kstrtouint() in the cpufreq code and use a
symbol instead of a raw number in it (Bowen Yu)
- Add support for autonomous CPU performance state selection to the
CPPC cpufreq driver (Lifeng Zheng)
- OPP: Add dev_pm_opp_set_level() (Praveen Talari)
- Introduce scope-based cleanup headers and mutex locking guards in
OPP core (Viresh Kumar)
- Switch OPP to use kmemdup_array() (Zhang Enpei)
- Optimize bucket assignment when next_timer_ns equals KTIME_MAX in
the menu cpuidle governor (Zhongqiu Han)
- Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla)
- Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem
Bityutskiy)
- Fix typos in two comments in the teo cpuidle governor (Atul Kumar
Pant)
- Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
Kalla)
- Move debug runtime PM attributes to runtime_attrs[] (Rafael
Wysocki)
- Add new devm_ functions for enabling runtime PM and runtime PM
reference counting (Bence Csókás)
- Remove size arguments from strscpy() calls in the hibernation core
code (Thorsten Blum)
- Adjust the handling of devices with asynchronous suspend enabled
during system suspend and resume to start resuming them immediately
after resuming their parents and to start suspending such a device
immediately after suspending its first child (Rafael Wysocki)
- Adjust messages printed during tasks freezing to avoid using
pr_cont() (Andrew Sayers, Paul Menzel)
- Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
Zhang)
- Add missing wakeup source attribute relax_count to sysfs and remove
the space character at the end ofi the string produced by
pm_show_wakelocks() (Zijun Hu)
- Add configurable pm_test delay for hibernation (Zihuan Zhang)
- Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
cypd4226 device on Tegra boards from suspending prematurely (Jon
Hunter)
- Unbreak printing PM debug messages during hibernation and clean up
some related code (Rafael Wysocki)
- Add a systemd service to run cpupower and change cpupower binding's
Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli)"
* tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
cpufreq: CPPC: Add support for autonomous selection
cpufreq: Update sscanf() to kstrtouint()
cpufreq: Replace magic number
OPP: switch to use kmemdup_array()
PM: freezer: Rewrite restarting tasks log to remove stray *done.*
PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn()
cpufreq: drop redundant cpus_read_lock() from store_local_boost()
cpupower: do not install files to /etc/default/
cpupower: do not call systemctl at install time
cpupower: do not write DESTDIR to cpupower.service
PM: sleep: Introduce pm_sleep_transition_in_progress()
cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver()
cpufreq: intel_pstate: Document hybrid processor support
cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache
cpufreq: intel_pstate: EAS support for hybrid platforms
PM: EM: Introduce em_adjust_cpu_capacity()
PM: EM: Move CPU capacity check to em_adjust_new_capacity()
PM: EM: Documentation: Fix typos in example driver code
cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas()
PM: sleep: Introduce pm_suspend_in_progress()
...
This bulk of the changes in this release are driver work, as well as
new device support we have some important work on performance over
several drivers, and big overhauls for maintainability on a couple too.
Highlights include:
- Big cleanups of the sh-msiof driver from Geert Uytterhoeven, and of
the NXP FSPI driver from Haibo Chen.
- Performance improvements for the AXI SPI engine.
- Support for writes to memory mapped flashes on Renesas devices.
- Integrated DMA support for Tegra210 QSPI, used by the Tegra234.
- DMA support for Amlogic SPI controllers.
- Support for AMD HID2, Qualcomm IPQ5018, Renesas RZ/G3E, Rockchip
RK3528 and Samsung Exynos Autov920.
An update to fix some issues with the Atmel QSPI driver runtime PM
pulled in a new API from the PM core, and the Renesas memory mapped
write changes pull in some code that's shared in drivers/memory.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmgzCzkACgkQJNaLcl1U
h9DF5gf+MlaB2j6ggH/OnXdV6aQTcjta357+ptWH3CDQbx0nsMj50i+eqxhzOK69
tyOMX0gcqo8bBQq1mhm3Z2fDYbYYRKwf7OwymHD1KGNGn9zegORNw790/XGTeoMo
m5f9Trz9UNP+7NjWuOioKoy6zghp4Ti2WAAUPhGUQBnyaV269bonNvisRtqnDP5u
nOu1DYANZqL3icuUIdJ3Aklmn4On1svUPuE9HOn/GNocZZWegcCHfBjmSRuLUAGO
RR1jP/d+RfSvTSrdqZGwrfWLFBBirjSLXNB1g7sjiVTnSseuBnOZ/z7PuBQQ+NuB
ENyoQXNLFZKMTQ5wmuQpXnwIIXsxYg==
=FmJ2
-----END PGP SIGNATURE-----
Merge tag 'spi-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"The bulk of the changes in this release are driver work, as well as
new device support we have some important work on performance over
several drivers, and big overhauls for maintainability on a couple
too. Highlights include:
- Big cleanups of the sh-msiof driver from Geert Uytterhoeven, and of
the NXP FSPI driver from Haibo Chen
- Performance improvements for the AXI SPI engine
- Support for writes to memory mapped flashes on Renesas devices
- Integrated DMA support for Tegra210 QSPI, used by the Tegra234
- DMA support for Amlogic SPI controllers
- Support for AMD HID2, Qualcomm IPQ5018, Renesas RZ/G3E, Rockchip
RK3528 and Samsung Exynos Autov920
An update to fix some issues with the Atmel QSPI driver runtime PM
pulled in a new API from the PM core, and the Renesas memory mapped
write changes pull in some code that's shared in drivers/memory"
* tag 'spi-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (90 commits)
spi: spi-qpic-snand: return early on error from qcom_spi_io_op()
spi: loopback-test: fix up const pointer issue in rx_ranges_cmp()
spi: gpio: fix const issue in spi_to_spi_gpio()
spi: spi-qpic-snand: remove superfluous parameters of qcom_spi_check_error()
dt-bindings: spi: samsung: add exynosautov920-spi compatible
spi: spi-qpic-snand: reuse qcom_spi_check_raw_flash_errors()
spi: dt-bindings: Add rk3528-spi compatible
spi: spi_amd: Update Kconfig dependencies
spi: spi_amd: Add HIDDMA basic write support
spi: spi_amd: Remove read{q,b} usage on DMA buffer
spi: sh-msiof: Move register definitions to <linux/spi/sh_msiof.h>
spi: sh-msiof: Document frame start sync pulse mode
spi: sh-msiof: Double maximum DMA transfer size using two groups
spi: sh-msiof: Simplify BRG's Division Ratio
spi: sh-msiof: Increase TX FIFO size for R-Car V4H/V4M
spi: sh-msiof: Correct RX FIFO size for R-Car Gen3
spi: sh-msiof: Correct RX FIFO size for R-Car Gen2
spi: sh-msiof: Add core support for dual-group transfers
spi: sh-msiof: Correct SIMDR2_GRPMASK
spi: sh-msiof: SIFCTR bitfield conversion
...
This release we have one new feature, support for chips that report edge
interrupts but don't provide distinct readback of that status per line,
plus a few cleanups.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmgzCB8ACgkQJNaLcl1U
h9Afygf/YAoq/f8qgQXEVAam10UTAB2utkM6Vgm9RM6U7duyn9ky5mg+/46P2Bjo
UUevRfDs/rW3WlGY9EYc/cHTxTqXOgj/SF3oIVDXz5QUvr31WS1lFpWs48lduPkH
GqiqZs12AD3v+iqCRk0UU95/OwylY8Vv2PmTZ14iZwLvuFLRazLztDTc89UE2op6
ld28gwNs9LK4TWA+qN3/+4zW3w5dcjaus/Xclccv+C+mE0Bv7BNCMHUTIDS84D+D
Fjh3hkJx4LpcoZlRVxR+bg16mvrA7fwjk4LH3Ipa4yjF1HRacbrWpoV6NIFXK/au
jpEQPBHf5Rv92hRZBwUNAf9YqYW30w==
=B8/l
-----END PGP SIGNATURE-----
Merge tag 'regmap-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"This release we have one new feature, support for chips that report
edge interrupts but don't provide distinct readback of that status per
line, plus a few cleanups"
* tag 'regmap-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: irq: Add support for chips without separate IRQ status
regmap-irq: Use dedicated interrupt wake setters
regmap: Move selecting for REGMAP_MDIO and REGMAP_IRQ
regcache: Use sort()'s default swap() implementation
GPIO core:
- use more lock guards where applicable
- refactor GPIO ACPI code and shrink it in the process by 8%
- move GPIO ACPI quirks into a separate file
- remove unneeded #ifdef
- convert GPIO devres helpers to using devm_add_action() where applicable
which shrinks and simplifies the code
- refactor GPIO descriptor validation in GPIO consumer interfaces
- don't allow setting values on input lines in the GPIO core which will
take off the burden from GPIO drivers of checking this down the line
- provide gpiod_is_equal() as a way of safely comparing two GPIO
descriptors (the only current user is in regulator core)
New drivers:
- add the GPIO module for the max77759 multifunction device
- add the GPIO driver for the VeriSilicon BLZP1600 GPIO controller
- add the GPIO driver for the Spacemit K1 SoC
Driver improvements:
- convert more drivers to using the new GPIO line value setter callbacks
- convert more drivers to making the irq_chip immutable as is recommended
by the interrupt subsystem
- extend build testing coverage by enabling more modules to be built with
COMPILE_TEST=y
- extend the gpio-aggregator module with a configfs interface that makes
the setup easier for user-space than the existing driver-level sysfs
attributes and also adds more advanced configuration features (such as
referring to aggregated lines by their original names or modifying
their names as exposed by the aggregated chip)
- add a missing mutex_destroy() in gpio-imx-scu
- add an OF polarity quirk for s5m8767
- allow building gpio-vf610 as a loadable module
- make gpio-mxc not hardcode its GPIO base number with GPIO SYSFS
interface disabled (another small step towards getting rid of the global
GPIO numberspace)
- add support for level-triggered interrupts to gpio-pca953x
- don't double-check the ngpios property in gpio-ds4520 as GPIO core
already does it
- don't double-check the number of GPIOs in gpio-imx-scu as GPIO core
already does it
- remove unused callbacks from gpio-max3191x
DT bindings:
- add device-tree bindings for max77759, spacemit,k1 and blzp1600 (new
drivers added this cycle)
- document more properties for gpio-vf610 and gpio-tegra186
- document a new pca95xx variant
- fix style of examples in several GPIO DT-binding documents
Misc:
- TODO list updates
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmg0NtQACgkQEacuoBRx
13Iolg/+P8fe1hTek+UgdKm/EAQ1Mn3oijNE1Ix15VD8Iqacu+URyB2SJMFcg27n
S/tsuwogQeQmdgXPfYDJkQmiZEyln/ytWf5W2lNwYhGfGujVa8h1FueB7Wb8Zs7G
PNMnobyAIGivodJfvikDEyczMuxhkOH04ZOT7UpTSPI47BSGsujX/1vgmRQLid1Z
3wFDJ0yDhVcuxit/VC+LzFpHIV0MiRzGpvHzYid5jjEaGSiRMpHixf27VJGc0gG1
IJLkhNkwZ3InisWVGvqdRg/FUNErRYKYQSARb4AjCU+/y1H0SWdB0R6sZDTZpP+e
YqAc8FW31Lw1L7PWBLRTaVS3KT868tdXDCsArNzfBbb3u/WikO2GY/AXuzveZatp
pHwyPA0JS9QvxaTXU9yjCpGqdNfjbrmU5OkZxTTe+Nyz84fUfiURiE8g4Rl6riy4
fNzaywRBmVZlEECWSWGzyNw9ZEYDRPZ1ZHmOA+8FWE+/XKJIsVf8w3x2QIC5b/HO
hYKH4mar8oiEYJFZqoko3iQURJq+AD9wILCNpws5bSsi//VyyNT0mZV/q5hj7+Xx
pqeEGDInvycN5fDWWJlkN1lj5dDyHZi4uus05mYI9Ec+eX3XNWRUHXUskbpzdgCs
XepjP9kFQmMSL7y4z2d7tLd7gFup/uGny7o/KyMsIPDw7qVL5rY=
=PQqp
-----END PGP SIGNATURE-----
Merge tag 'gpio-updates-for-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"We have three new drivers, some refactoring in the GPIO core, lots of
various changes across many drivers, new configfs interface for the
virtual gpio-aggregator module and DT-bindings updates.
The treewide conversion of GPIO drivers to using the new value setter
callbacks is ongoing with another round of GPIO drivers updated. You
will also see these commits coming in from other subsystems as with
the relevant changes merged into mainline last cycle, I've started
converting GPIO providers located elsewhere than drivers/gpio/.
GPIO core:
- use more lock guards where applicable
- refactor GPIO ACPI code and shrink it in the process by 8%
- move GPIO ACPI quirks into a separate file
- remove unneeded #ifdef
- convert GPIO devres helpers to using devm_add_action() where
applicable which shrinks and simplifies the code
- refactor GPIO descriptor validation in GPIO consumer interfaces
- don't allow setting values on input lines in the GPIO core which
will take off the burden from GPIO drivers of checking this down
the line
- provide gpiod_is_equal() as a way of safely comparing two GPIO
descriptors (the only current user is in regulator core)
New drivers:
- add the GPIO module for the max77759 multifunction device
- add the GPIO driver for the VeriSilicon BLZP1600 GPIO controller
- add the GPIO driver for the Spacemit K1 SoC
Driver improvements:
- convert more drivers to using the new GPIO line value setter
callbacks
- convert more drivers to making the irq_chip immutable as is
recommended by the interrupt subsystem
- extend build testing coverage by enabling more modules to be built
with COMPILE_TEST=y
- extend the gpio-aggregator module with a configfs interface that
makes the setup easier for user-space than the existing
driver-level sysfs attributes and also adds more advanced
configuration features (such as referring to aggregated lines by
their original names or modifying their names as exposed by the
aggregated chip)
- add a missing mutex_destroy() in gpio-imx-scu
- add an OF polarity quirk for s5m8767
- allow building gpio-vf610 as a loadable module
- make gpio-mxc not hardcode its GPIO base number with GPIO SYSFS
interface disabled (another small step towards getting rid of the
global GPIO numberspace)
- add support for level-triggered interrupts to gpio-pca953x
- don't double-check the ngpios property in gpio-ds4520 as GPIO core
already does it
- don't double-check the number of GPIOs in gpio-imx-scu as GPIO core
already does it
- remove unused callbacks from gpio-max3191x
DT bindings:
- add device-tree bindings for max77759, spacemit,k1 and blzp1600
(new drivers added this cycle)
- document more properties for gpio-vf610 and gpio-tegra186
- document a new pca95xx variant
- fix style of examples in several GPIO DT-binding documents
Misc:
- TODO list updates"
* tag 'gpio-updates-for-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (123 commits)
gpio: timberdale: select GPIOLIB_IRQCHIP
gpio: lpc18xx: select GPIOLIB_IRQCHIP
gpio: grgpio: select GPIOLIB_IRQCHIP
gpio: bcm-kona: select GPIOLIB_IRQCHIP
dt-bindings: gpio: vf610: add ngpios and gpio-reserved-ranges
gpio: davinci: select GPIOLIB_IRQCHIP
gpiolib-acpi: Update file references in the Documentation and MAINTAINERS
gpiolib: acpi: Move quirks to a separate file
gpiolib: acpi: Add acpi_gpio_need_run_edge_events_on_boot() getter
gpiolib: acpi: Handle deferred list via new API
gpiolib: acpi: Make sure we fill struct acpi_gpio_info
gpiolib: acpi: Switch to use enum in acpi_gpio_in_ignore_list()
gpiolib: acpi: Use temporary variable for struct acpi_gpio_info
gpiolib: remove unneeded #ifdef
gpio: mpc8xxx: select GPIOLIB_IRQCHIP
gpio: pxa: select GPIOLIB_IRQCHIP
gpio: pxa: Make irq_chip immutable
gpio: timberdale: Make irq_chip immutable
gpio: xgene-sb: Make irq_chip immutable
gpio: davinci: Make irq_chip immutable
...
- Convert init_timer*(), try_to_del_timer_sync() and
destroy_timer_on_stack() over to the canonical timer_*() namespace
convention.
There are is another large converstion pending, which has not been included
because it would have caused a gazillion of merge conflicts in next. The
conversion scripts will be run towards the end of the merge window and a
pull request sent once all conflict dependencies have been merged.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzgTkTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodwVD/97rF1Juqm1JZNIZPN/vMqwCxRoUkc6
tsK0+UC7UXusuJadxJ+Bsv25iPF+qejnThMU+SQ5yTVj/PNfxOe0WPdCEGGiL8Ye
2JCk6GqSOB/360SlLmtR1B1xHDwsuuUcQTz0w57CH66HRV5vpoWSMSwj/ypy+8nU
PlgjItaxdCKa9NJ+SUJZPWIxRkt/PsA1kwlV1OcxkgB++IiIHQEbPxECq9mlzWXF
b4Sq/Sdf2OmEePN+DYoey4fneRwJnkjkeX/o+CqosCPHRIiWUlSu5W/lU5IYojM3
s3XpMNNg/z8PMXR4JA2VaPYWLUZyBOs+3dM7Y6Am+z55EoxMxfzg6pGx2tfM4ftl
vF8wG3Z1c9MmpLk+P9LatNvfHeVLNve8KgOLa5phMDQ/El/a8KqLu6HmRDPONvKp
d6iXdPq1CP8P6jOtlFfzLmKPShgEcp+Zz9W3CaQR/0ZJEsEqrpKOLzdT86hJhBV0
mBCdzixmGtKAh0BdPdmg2FCLScqER3HKIJhZSdV8I+jSETIHCuMiIfbMXR7iwm/H
R1/ayvxrbc1mPseo28scqvo7m6cn5BFBxIUf4Sokp52ZCapz1v2aWzo4vHI0cTgT
ZOjlTrf+fgYLn1dqdD45TJiQPnmRrw4dU+WWSFRFJY2qjfyucj80vdqdkE5zkp5b
UPomlVimG4ccPg==
=FHGU
-----END PGP SIGNATURE-----
Merge tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"Another set of timer API cleanups:
- Convert init_timer*(), try_to_del_timer_sync() and
destroy_timer_on_stack() over to the canonical timer_*()
namespace convention.
There is another large conversion pending, which has not been included
because it would have caused a gazillion of merge conflicts in next.
The conversion scripts will be run towards the end of the merge window
and a pull request sent once all conflict dependencies have been
merged"
* tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack()
treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try()
timers: Rename init_timers() as timers_init()
timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA
timers: Rename __init_timer_on_stack() as __timer_init_on_stack()
timers: Rename __init_timer() as __timer_init()
timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack()
timers: Rename init_timer_key() as timer_init_key()
- Switch the MSI decriptor locking to lock guards
- Replace a broken and naive implementation of PCI/MSI-X control word
updates in the PCI/TPH driver with a properly serialized variant in the
PCI/MSI core code.
- Remove the MSI descriptor abuse in the SCCI/UFS/QCOM driver by
replacing the direct access to the MSI descriptors with the proper API
function calls. People will never understand that APIs exist for a
reason...
- Provide core infrastructre for the upcoming PCI endpoint library
extensions. Currently limited to ARM GICv3+, but in theory extensible
to other architectures.
- Provide a MSI domain::teardown() callback, which allows drivers to undo
the effects of the prepare() callback.
- Move the MSI domain::prepare() callback invocation to domain creation
time to avoid redundant (and in case of ARM/GIC-V3-ITS confusing)
invocations on every allocation.
In combination with the new teardown callback this removes some ugly
hacks in the GIC-V3-ITS driver, which pretended to work around the
short comings of the core code so far. With this update the code is
correct by design and implementation.
- Make the irqchip MSI library globally available, provide a MSI parent
domain creation helper and convert a bunch of (PCI/)MSI drivers over to
the modern MSI parent mechanism. This is the first step to get rid of
at least one incarnation of the three PCI/MSI management schemes.
- The usual small cleanups and improvements
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzgFsTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoR0KD/402K12tlI/D70H2aTG25dbTx+dkVk+
pKpJz0985uUlLJiPCR54dZL0ofcfRU+CdjEIf1I+6TPshtg6IWLJCfqu7OWVPYzz
2lJDO0yeUGwJqc0CIa1vttvJWvcUcxfWBX/ZSkOIM5avaXqSwRwsFNfd7TQ+T+eG
79VS1yyW197mUva53ekSF2voa8EEPWfEslAjoX1dRg5d4viAxaLtKm/KpBqo1oPh
Eb+E67xEWiIonvWNdr1AOisxnbi19PyDo1xnftgBToaeXXYBodNrNIAfAkx40YUZ
IZQLHvhZ91x15hXYIS4Cz1RXqPECbu/tHxs4AFUgGvqdgJUF89wzI3C21ymrKA6E
tDlWfpIcuE3vV/bsqj1gHGL5G5m1tyBRgIdIAOOmMoTHvwp5rrQtuZzpuqzGmEzj
iVIHnn5m08kRpOZQc7+PlxQMh3eunEyj9WWG49EJgoAnJPb5lou4shTwBUheHcKm
NXxKsfo4x5C+WehGTxv80UlnMcK3Yh/TuWf2OPR6QuT2iHP2VL5jyHjIs0ICn0cp
1tvSJtdc1rgvk/4Vn4lu5eyVaTx5ZAH8ZXNQfwwBTWTp3ZyAW+7GkaCq3LPaNJoZ
4LWpgZ5gs6wT+1XNT3boKdns81VolmeTI8P1ciQKpUtaTt6Cy9P/i2az/J+BCS4U
Fn5Qqk08PHGrUQ==
=OBMj
-----END PGP SIGNATURE-----
Merge tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI updates from Thomas Gleixner:
"Updates for the MSI subsystem (core code and PCI):
- Switch the MSI descriptor locking to lock guards
- Replace a broken and naive implementation of PCI/MSI-X control word
updates in the PCI/TPH driver with a properly serialized variant in
the PCI/MSI core code.
- Remove the MSI descriptor abuse in the SCCI/UFS/QCOM driver by
replacing the direct access to the MSI descriptors with the proper
API function calls. People will never understand that APIs exist
for a reason...
- Provide core infrastructre for the upcoming PCI endpoint library
extensions. Currently limited to ARM GICv3+, but in theory
extensible to other architectures.
- Provide a MSI domain::teardown() callback, which allows drivers to
undo the effects of the prepare() callback.
- Move the MSI domain::prepare() callback invocation to domain
creation time to avoid redundant (and in case of ARM/GIC-V3-ITS
confusing) invocations on every allocation.
In combination with the new teardown callback this removes some
ugly hacks in the GIC-V3-ITS driver, which pretended to work around
the short comings of the core code so far. With this update the
code is correct by design and implementation.
- Make the irqchip MSI library globally available, provide a MSI
parent domain creation helper and convert a bunch of (PCI/)MSI
drivers over to the modern MSI parent mechanism. This is the first
step to get rid of at least one incarnation of the three PCI/MSI
management schemes.
- The usual small cleanups and improvements"
* tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
PCI/MSI: Use bool for MSI enable state tracking
PCI: tegra: Convert to MSI parent infrastructure
PCI: xgene: Convert to MSI parent infrastructure
PCI: apple: Convert to MSI parent infrastructure
irqchip/msi-lib: Honour the MSI_FLAG_NO_AFFINITY flag
irqchip/mvebu: Convert to msi_create_parent_irq_domain() helper
irqchip/gic: Convert to msi_create_parent_irq_domain() helper
genirq/msi: Add helper for creating MSI-parent irq domains
irqchip: Make irq-msi-lib.h globally available
irqchip/gic-v3-its: Use allocation size from the prepare call
genirq/msi: Engage the .msi_teardown() callback on domain removal
genirq/msi: Move prepare() call to per-device allocation
irqchip/gic-v3-its: Implement .msi_teardown() callback
genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare()
irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask
dt-bindings: PCI: pci-ep: Add support for iommu-map and msi-map
irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS
irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable()
platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
genirq/msi: Rename msi_[un]lock_descs()
...
Boot code changes:
- A large series of changes to reorganize the x86 boot code into a better isolated
and easier to maintain base of PIC early startup code in arch/x86/boot/startup/,
by Ard Biesheuvel.
Motivation & background:
| Since commit
|
| c88d71508e ("x86/boot/64: Rewrite startup_64() in C")
|
| dated Jun 6 2017, we have been using C code on the boot path in a way
| that is not supported by the toolchain, i.e., to execute non-PIC C
| code from a mapping of memory that is different from the one provided
| to the linker. It should have been obvious at the time that this was a
| bad idea, given the need to sprinkle fixup_pointer() calls left and
| right to manipulate global variables (including non-pointer variables)
| without crashing.
|
| This C startup code has been expanding, and in particular, the SEV-SNP
| startup code has been expanding over the past couple of years, and
| grown many of these warts, where the C code needs to use special
| annotations or helpers to access global objects.
This tree includes the first phase of this work-in-progress x86 boot code
reorganization.
Scalability enhancements and micro-optimizations:
- Improve code-patching scalability (Eric Dumazet)
- Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper)
CPU features enumeration updates:
- Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S. Darwish)
- Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish, Thomas Gleixner)
- Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish)
Memory management changes:
- Allow temporary MMs when IRQs are on (Andy Lutomirski)
- Opt-in to IRQs-off activate_mm() (Andy Lutomirski)
- Simplify choose_new_asid() and generate better code (Borislav Petkov)
- Simplify 32-bit PAE page table handling (Dave Hansen)
- Always use dynamic memory layout (Kirill A. Shutemov)
- Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov)
- Make 5-level paging support unconditional (Kirill A. Shutemov)
- Stop prefetching current->mm->mmap_lock on page faults (Mateusz Guzik)
- Predict valid_user_address() returning true (Mateusz Guzik)
- Consolidate initmem_init() (Mike Rapoport)
FPU support and vector computing:
- Enable Intel APX support (Chang S. Bae)
- Reorgnize and clean up the xstate code (Chang S. Bae)
- Make task_struct::thread constant size (Ingo Molnar)
- Restore fpu_thread_struct_whitelist() to fix CONFIG_HARDENED_USERCOPY=y
(Kees Cook)
- Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg Nesterov)
- Always preserve non-user xfeatures/flags in __state_perm (Sean Christopherson)
Microcode loader changes:
- Help users notice when running old Intel microcode (Dave Hansen)
- AMD: Do not return error when microcode update is not necessary (Annie Li)
- AMD: Clean the cache if update did not load microcode (Boris Ostrovsky)
Code patching (alternatives) changes:
- Simplify, reorganize and clean up the x86 text-patching code (Ingo Molnar)
- Make smp_text_poke_batch_process() subsume smp_text_poke_batch_finish()
(Nikolay Borisov)
- Refactor the {,un}use_temporary_mm() code (Peter Zijlstra)
Debugging support:
- Add early IDT and GDT loading to debug relocate_kernel() bugs (David Woodhouse)
- Print the reason for the last reset on modern AMD CPUs (Yazen Ghannam)
- Add AMD Zen debugging document (Mario Limonciello)
- Fix opcode map (!REX2) superscript tags (Masami Hiramatsu)
- Stop decoding i64 instructions in x86-64 mode at opcode (Masami Hiramatsu)
CPU bugs and bug mitigations:
- Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov)
- Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov)
- Restructure and harmonize the various CPU bug mitigation methods
(David Kaplan)
- Fix spectre_v2 mitigation default on Intel (Pawan Gupta)
MSR API:
- Large MSR code and API cleanup (Xin Li)
- In-kernel MSR API type cleanups and renames (Ingo Molnar)
PKEYS:
- Simplify PKRU update in signal frame (Chang S. Bae)
NMI handling code:
- Clean up, refactor and simplify the NMI handling code (Sohil Mehta)
- Improve NMI duration console printouts (Sohil Mehta)
Paravirt guests interface:
- Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov)
SEV support:
- Share the sev_secrets_pa value again (Tom Lendacky)
x86 platform changes:
- Introduce the <asm/amd/> header namespace (Ingo Molnar)
- i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to <asm/amd/fch.h>
(Mario Limonciello)
Fixes and cleanups:
- x86 assembly code cleanups and fixes (Uros Bizjak)
- Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy Shevchenko,
Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav Petkov, Chang S. Bae,
Chao Gao, Dan Williams, Dave Hansen, David Kaplan, David Woodhouse,
Eric Biggers, Ingo Molnar, Josh Poimboeuf, Juergen Gross, Malaya Kumar Rout,
Mario Limonciello, Nathan Chancellor, Oleg Nesterov, Pawan Gupta,
Peter Zijlstra, Shivank Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak,
Xin Li)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy9WARHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jJSw/+OW2zvAx602doujBIE17vFLU7R10Xwj5H
lVgomkWCoTNscUZPhdT/iI+/kQF1fG8PtN9oZKUsTAUswldKJsqu7KevobviesiW
qI+FqH/fhHaIk7GVh9VP65Dgrdki8zsgd7BFxD8pLRBlbZTxTxXNNkuNJrs6LxJh
SxWp/FVtKo6Wd57qlUcsdo0tilAfcuhlEweFUarX55X2ouhdeHjcGNpxj9dHKOh8
M7R5yMYFrpfdpSms+WaCnKKahWHaIQtQTsPAyKwoVdtfl1kK+7NgaCF55Gbo3ogp
r59JwC/CGruDa5QnnDizCwFIwpZw9M52Q1NhP/eLEZbDGB4Yya3b5NW+Ya+6rPvO
ZZC3e1uUmlxW3lrYflUHurnwrVb2GjkQZOdf0gfnly/7LljIicIS2dk4qIQF9NBd
sQPpW5hjmIz9CsfeL8QaJW38pQyMsQWznFuz4YVuHcLHvleb3hR+n4fNfV5Lx9bw
oirVETSIT5hy/msAgShPqTqFUEiVCgp16ow20YstxxzFu/FQ+VG987tkeUyFkPMe
q1v5yF1hty+TkM4naKendIZ/MJnsrv0AxaegFz9YQrKGL1UPiOajQbSyKbzbto7+
ozmtN0W80E8n4oQq008j8htpgIhDV91UjF5m33qB82uSqKihHPPTsVcbeg5nZwh2
ti5g/a1jk94=
=JgQo
-----END PGP SIGNATURE-----
Merge tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:
"Boot code changes:
- A large series of changes to reorganize the x86 boot code into a
better isolated and easier to maintain base of PIC early startup
code in arch/x86/boot/startup/, by Ard Biesheuvel.
Motivation & background:
| Since commit
|
| c88d71508e ("x86/boot/64: Rewrite startup_64() in C")
|
| dated Jun 6 2017, we have been using C code on the boot path in a way
| that is not supported by the toolchain, i.e., to execute non-PIC C
| code from a mapping of memory that is different from the one provided
| to the linker. It should have been obvious at the time that this was a
| bad idea, given the need to sprinkle fixup_pointer() calls left and
| right to manipulate global variables (including non-pointer variables)
| without crashing.
|
| This C startup code has been expanding, and in particular, the SEV-SNP
| startup code has been expanding over the past couple of years, and
| grown many of these warts, where the C code needs to use special
| annotations or helpers to access global objects.
This tree includes the first phase of this work-in-progress x86
boot code reorganization.
Scalability enhancements and micro-optimizations:
- Improve code-patching scalability (Eric Dumazet)
- Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper)
CPU features enumeration updates:
- Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S.
Darwish)
- Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish,
Thomas Gleixner)
- Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish)
Memory management changes:
- Allow temporary MMs when IRQs are on (Andy Lutomirski)
- Opt-in to IRQs-off activate_mm() (Andy Lutomirski)
- Simplify choose_new_asid() and generate better code (Borislav
Petkov)
- Simplify 32-bit PAE page table handling (Dave Hansen)
- Always use dynamic memory layout (Kirill A. Shutemov)
- Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov)
- Make 5-level paging support unconditional (Kirill A. Shutemov)
- Stop prefetching current->mm->mmap_lock on page faults (Mateusz
Guzik)
- Predict valid_user_address() returning true (Mateusz Guzik)
- Consolidate initmem_init() (Mike Rapoport)
FPU support and vector computing:
- Enable Intel APX support (Chang S. Bae)
- Reorgnize and clean up the xstate code (Chang S. Bae)
- Make task_struct::thread constant size (Ingo Molnar)
- Restore fpu_thread_struct_whitelist() to fix
CONFIG_HARDENED_USERCOPY=y (Kees Cook)
- Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg
Nesterov)
- Always preserve non-user xfeatures/flags in __state_perm (Sean
Christopherson)
Microcode loader changes:
- Help users notice when running old Intel microcode (Dave Hansen)
- AMD: Do not return error when microcode update is not necessary
(Annie Li)
- AMD: Clean the cache if update did not load microcode (Boris
Ostrovsky)
Code patching (alternatives) changes:
- Simplify, reorganize and clean up the x86 text-patching code (Ingo
Molnar)
- Make smp_text_poke_batch_process() subsume
smp_text_poke_batch_finish() (Nikolay Borisov)
- Refactor the {,un}use_temporary_mm() code (Peter Zijlstra)
Debugging support:
- Add early IDT and GDT loading to debug relocate_kernel() bugs
(David Woodhouse)
- Print the reason for the last reset on modern AMD CPUs (Yazen
Ghannam)
- Add AMD Zen debugging document (Mario Limonciello)
- Fix opcode map (!REX2) superscript tags (Masami Hiramatsu)
- Stop decoding i64 instructions in x86-64 mode at opcode (Masami
Hiramatsu)
CPU bugs and bug mitigations:
- Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov)
- Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov)
- Restructure and harmonize the various CPU bug mitigation methods
(David Kaplan)
- Fix spectre_v2 mitigation default on Intel (Pawan Gupta)
MSR API:
- Large MSR code and API cleanup (Xin Li)
- In-kernel MSR API type cleanups and renames (Ingo Molnar)
PKEYS:
- Simplify PKRU update in signal frame (Chang S. Bae)
NMI handling code:
- Clean up, refactor and simplify the NMI handling code (Sohil Mehta)
- Improve NMI duration console printouts (Sohil Mehta)
Paravirt guests interface:
- Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov)
SEV support:
- Share the sev_secrets_pa value again (Tom Lendacky)
x86 platform changes:
- Introduce the <asm/amd/> header namespace (Ingo Molnar)
- i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to
<asm/amd/fch.h> (Mario Limonciello)
Fixes and cleanups:
- x86 assembly code cleanups and fixes (Uros Bizjak)
- Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy
Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav
Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David
Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf,
Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan
Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank
Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)"
* tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits)
x86/bugs: Fix spectre_v2 mitigation default on Intel
x86/bugs: Restructure ITS mitigation
x86/xen/msr: Fix uninitialized variable 'err'
x86/msr: Remove a superfluous inclusion of <asm/asm.h>
x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only
x86/mm/64: Make 5-level paging support unconditional
x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model
x86/mm/64: Always use dynamic memory layout
x86/bugs: Fix indentation due to ITS merge
x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor()
x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter
x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter
x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2()
x86/cpuid: Rename have_cpuid_p() to cpuid_feature()
x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header
x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h>
x86/msr: Add rdmsrl_on_cpu() compatibility wrapper
x86/mm: Fix kernel-doc descriptions of various pgtable methods
x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too
x86/boot: Defer initialization of VM space related global variables
...
Merge updates related to system sleep handling and runtime PM for 6.16-rc1:
- Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
Kalla).
- Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki).
- Add new devm_ functions for enabling runtime PM and runtime PM
reference counting (Bence Csókás).
- Remove size arguments from strscpy() calls in the hibernation core
code (Thorsten Blum).
- Adjust the handling of devices with asynchronous suspend enabled
during system suspend and resume to start resuming them immediately
after resuming their parents and to start suspending such a device
immediately after suspending its first child (Rafael Wysocki).
- Adjust messages printed during tasks freezing to avoid using
pr_cont() (Andrew Sayers, Paul Menzel).
- Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
Zhang).
- Add missing wakeup source attribute relax_count to sysfs and
remove the space character at the end ofi the string produced by
pm_show_wakelocks() (Zijun Hu).
- Add configurable pm_test delay for hibernation (Zihuan Zhang).
- Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
cypd4226 device on Tegra boards from suspending prematurely (Jon
Hunter).
- Unbreak printing PM debug messages during hibernation and clean up
some related code (Rafael Wysocki).
* pm-runtime:
PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn()
PM: sysfs: Move debug runtime PM attributes to runtime_attrs[]
PM: runtime: Add new devm functions
* pm-sleep:
PM: freezer: Rewrite restarting tasks log to remove stray *done.*
PM: sleep: Introduce pm_sleep_transition_in_progress()
PM: sleep: Introduce pm_suspend_in_progress()
PM: sleep: Print PM debug messages during hibernation
ucsi_ccg: Disable async suspend in ucsi_ccg_probe()
PM: hibernate: add configurable delay for pm_test
PM: wakeup: Delete space in the end of string shown by pm_show_wakelocks()
PM: wakeup: Add missing wakeup source attribute relax_count
PM: sleep: Remove unnecessary !!
PM: sleep: Use two lines for "Restarting..." / "done" messages
PM: sleep: Make suspend of devices more asynchronous
PM: sleep: Suspend async parents after suspending children
PM: sleep: Resume children after resuming the parent
PM: hibernate: Remove size arguments when calling strscpy()
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgwnGYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpq9aD/4iqOts77xhWWLrOJWkkhOcV5rREeyppq8X
MKYul9S4cc4Uin9Xou9a+nab31QBQEk3nsN3kX9o3yAXvkh6yUm36HD8qYNW/46q
IUkwRQQJ0COyTnexMZQNTbZPQDIYcenXmQxOcrEJ5jC1Jcz0sOKHsgekL+ab3kCy
fLnuz2ozvjGDMala/NmE8fN5qSlj4qQABHgbamwlwfo4aWu07cwfqn5G/FCYJgDO
xUvsnTVclom2g4G+7eSSvGQI1QyAxl5QpviPnj/TEgfFBFnhbCSoBTEY6ecqhlfW
6u59MF/Uw8E+weiuGY4L87kDtBhjQs3UMSLxCuwH7MxXb25ff7qB4AIkcFD0kKFH
3V5NtwqlU7aQT0xOjGxaHhfPwjLD+FVss4ARmuHS09/Kn8egOW9yROPyetnuH84R
Oz0Ctnt1IPLFjvGeg3+rt9fjjS9jWOXLITb9Q6nX9gnCt7orCwIYke8YCpmnJyhn
i+fV4CWYIQBBRKxIT0E/GhJxZOmL0JKpomnbpP2dH8npemnsTCuvtfdrK9gfhH2X
chBVqCPY8MNU5zKfzdEiavPqcm9392lMzOoOXW2pSC1eAKqnAQ86ZT3r7rLntqE8
75LxHcvaQIsnpyG+YuJVHvoiJ83TbqZNpyHwNaQTYhDmdYpp2d/wTtTQywX4DuXb
Y6NDJw5+kQ==
=1PNK
-----END PGP SIGNATURE-----
Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- ublk updates:
- Add support for updating the size of a ublk instance
- Zero-copy improvements
- Auto-registering of buffers for zero-copy
- Series simplifying and improving GET_DATA and request lookup
- Series adding quiesce support
- Lots of selftests additions
- Various cleanups
- NVMe updates via Christoph:
- add per-node DMA pools and use them for PRP/SGL allocations
(Caleb Sander Mateos, Keith Busch)
- nvme-fcloop refcounting fixes (Daniel Wagner)
- support delayed removal of the multipath node and optionally
support the multipath node for private namespaces (Nilay Shroff)
- support shared CQs in the PCI endpoint target code (Wilfred
Mallawa)
- support admin-queue only authentication (Hannes Reinecke)
- use the crc32c library instead of the crypto API (Eric Biggers)
- misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes
Reinecke, Leon Romanovsky, Gustavo A. R. Silva)
- MD updates via Yu:
- Fix that normal IO can be starved by sync IO, found by mkfs on
newly created large raid5, with some clean up patches for bdev
inflight counters
- Clean up brd, getting rid of atomic kmaps and bvec poking
- Add loop driver specifically for zoned IO testing
- Eliminate blk-rq-qos calls with a static key, if not enabled
- Improve hctx locking for when a plug has IO for multiple queues
pending
- Remove block layer bouncing support, which in turn means we can
remove the per-node bounce stat as well
- Improve blk-throttle support
- Improve delay support for blk-throttle
- Improve brd discard support
- Unify IO scheduler switching. This should also fix a bunch of lockdep
warnings we've been seeing, after enabling lockdep support for queue
freezing/unfreezeing
- Add support for block write streams via FDP (flexible data placement)
on NVMe
- Add a bunch of block helpers, facilitating the removal of a bunch of
duplicated boilerplate code
- Remove obsolete BLK_MQ pci and virtio Kconfig options
- Add atomic/untorn write support to blktrace
- Various little cleanups and fixes
* tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits)
selftests: ublk: add test for UBLK_F_QUIESCE
ublk: add feature UBLK_F_QUIESCE
selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
traceevent/block: Add REQ_ATOMIC flag to block trace events
ublk: run auto buf unregisgering in same io_ring_ctx with registering
io_uring: add helper io_uring_cmd_ctx_handle()
ublk: remove io argument from ublk_auto_buf_reg_fallback()
ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch()
selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK
selftests: ublk: support UBLK_F_AUTO_BUF_REG
ublk: support UBLK_AUTO_BUF_REG_FALLBACK
ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG
ublk: prepare for supporting to register request buffer automatically
ublk: convert to refcount_t
selftests: ublk: make IO & device removal test more stressful
nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk
nvme: introduce multipath_always_on module param
nvme-multipath: introduce delayed removal of the multipath head node
nvme-pci: derive and better document max segments limits
nvme-pci: use struct_size for allocation struct nvme_dev
...
Merge cpufreq updates for 6.16-rc1:
- Refactor cpufreq_online(), add and use cpufreq policy locking guards,
use __free() in policy reference counting, and clean up core cpufreq
code on top of that (Rafael Wysocki).
- Fix boost handling on CPU suspend/resume and sysfs updates (Viresh
Kumar).
- Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay
Ugwekar).
- Add offline, online and suspend callbacks to the amd-pstate driver,
rename and use the existing amd_pstate_epp callbacks in it (Dhananjay
Ugwekar).
- Add support for the "Requested CPU Min frequency" BIOS option to the
amd-pstate driver (Dhananjay Ugwekar).
- Reset amd-pstate driver mode after running selftests (Swapnil
Sapkal).
- Add helper for governor checks to the schedutil cpufreq governor and
move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki).
- Populate the cpu_capacity sysfs entries from the intel_pstate driver
after registering asym capacity support (Ricardo Neri).
- Add support for enabling Energy-aware scheduling (EAS) to the
intel_pstate driver when operating in the passive mode on a hybrid
platform (Rafael Wysocki).
- Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan
Chancellor).
- Drop redundant cpus_read_lock() from store_local_boost() in the
cpufreq core (Seyediman Seyedarab).
- Replace sscanf() with kstrtouint() in the cpufreq code and use a
symbol instead of a raw number in it (Bowen Yu).
- Add support for autonomous CPU performance state selection to the
CPPC cpufreq driver (Lifeng Zheng).
* pm-cpufreq: (31 commits)
cpufreq: CPPC: Add support for autonomous selection
cpufreq: Update sscanf() to kstrtouint()
cpufreq: Replace magic number
cpufreq: drop redundant cpus_read_lock() from store_local_boost()
cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver()
cpufreq: intel_pstate: Document hybrid processor support
cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache
cpufreq: intel_pstate: EAS support for hybrid platforms
cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas()
cpufreq: intel_pstate: Populate the cpu_capacity sysfs entries
arch_topology: Relocate cpu_scale to topology.[h|c]
cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq
cpufreq/sched: schedutil: Add helper for governor checks
amd-pstate-ut: Reset amd-pstate driver mode after running selftests
cpufreq/amd-pstate: Add support for the "Requested CPU Min frequency" BIOS option
cpufreq/amd-pstate: Add offline, online and suspend callbacks for amd_pstate_driver
cpufreq: Force sync policy boost with global boost on sysfs update
cpufreq: Preserve policy's boost state after resume
cpufreq: Introduce policy_set_boost()
cpufreq: Don't unnecessarily call set_boost()
...
Note - last minute rebase was to drop a typo patch that I'd accidentally
picked up (in the microblaze arch Kconfig)
Take 2 is due to that rebase messing up some fixes tags that were
referring to patches after that point.
There is a known merge conflict due to changes in neighbouring lines.
Stephen's resolution in linux-next is:
https://lore.kernel.org/linux-next/20250506155728.65605bae@canb.auug.org.au/
Added 3 named IIO reviewers to MAINTAINERS. This is a reflection of those
who have been doing much of this work for some time. Lars-Peter is
removed from the entry having moved on to other topics. Thanks
Nuno, David and Andy for stepping up and Lars-Peter for all your
hard work in the past!
Includes the usual mix of new device support, features and general
cleanup.
This time we also have some tree wide changes.
- Rip out the iio_device_claim_direct_scoped() as it proved hard to work
with. This series includes quite a few related cleanups such as use
of guard or factoring code out to allow direct returns.
- Switch from iio_device_claim/release_direct_mode() to new
iio_device_claim/release_direct() which is structured so that sparse
can warn on failed releases. There were a few false positives but
those were mostly in code that benefited from being cleaned up as part
of this process.
- Introduce iio_push_to_buffers_with_ts() to replace the _timestamp()
version over time. This version takes the size of the supplied buffer
which the core checks is at least as big as expected by calculation
from channel descriptions of those channels enabled. Use this in
an initial set of drivers.
- Add macros for IIO_DECLARE_BUFFER_WITH_TS() and
IIO_DECLARE_DMA_BUFFER_WITH_TS() to avoid lots of fiddly code to ensure
correctly aligned buffers for timestamps being added onto the end of
channel data.
New device support
------------------
adi,ad3530r
- New driver for AD3530, AD3530R, AD3531 and AD3531R DACs with
programmable gain controls. R variants have internal references.
adi,ad7476
- Add support (dt compatible only) for the Rohm BU79100G ADC which is
fully compatible with the ti,ads7866.
adi,ad7606
- Support ad7606c-16 and ad7606c-18 devices. Includes switch to dynamic
channel information allocation.
adi,ad7380
- Add support for the AD7389-4
dfrobot,sen0322
- New driver for this oxygen sensor.
mediatek,mt2701-auxadc
- Add binding for MT6893 which is fully compatible with already supported
MT8173.
meson-saradc
- Support the GXLX SoCs. Mostly this is a workaround for some unrelated
clock control bits found in the ADC register map.
nuvoton,nct7201
- New driver for NCT7201 and NCT7202 I2C ADCs.
rohm,bd79124
- New driver for this 12-bit, 8-channel SAR ADC.
- Switch to new set_rv etc gpio callbacks that were added in 6.15.
rohm,bd79703
- Add support for BD79700, BD79701 and BD79702 DACs that have subsets of
functionality of the already supported bd79703. Included making this
driver suitable for support device variants.
st,stm32-lptimer
- Add support for stm32pm25 to this trigger.
Features
--------
Beyond IIO
- Property iterator for named children.
core
- Enable writes for 64 bit integers used for standard IIO ABI elements.
Previously these could be read only.
- Helper library that should avoid code duplication for simpler ADC
bindings that have a child node per channel.
- Enforce that IIO_DMA_MINALIGN is always at least 8 (almost always true
and simplifies code on all significant architectures)
core/backend
- Add support to control source of data - useful when the HDL includes
things like generated ramps for testing purposes. Enable this for
adi-axi-dac
adi,ad3552-hs
- Add debugfs related callbacks to allow debug access to register contents.
adi,ad4000
- Support SPI offload with appropriate FPGA firmware along with improving
documentation.
adi,ad7293
- Add support for external reference voltage.
adi,ad7606
- Support SPI offload.
adi,ad7768-1
- Support reset GPIO.
adi,admv8818
- Support filter frequencies beyond 2^32.
adi,adxl345
- Add single and double tap events.
hid-sensor-prox
- Support 16-bit report sizes as seen on some Intel platforms.
invensense,icm42600
- Enable use of named interrupts to avoid problems with some wiring choices.
Get the interrupt by name, but fallback to previous assumption on the first
being INT1 if no names are supplied.
microchip,mcp3911
- Add reset gpio support.
rohm,bh7150
- Add reset gpio support.
st,stm32
- Add support to control oversampling.
ti,adc128s052
- Add support for ROHM BD79104 which is early compatible with the TI
parts already supported by this driver. Includes some general driver
cleanup and a separate dt binding.
- Simplify reference voltage handling by assuming it is fixed after enabling
the supply.
winsen,mhz19b
- New driver for this C02 sensor.
Cleanup and minor fixes
-----------------------
dt-bindings
- Correct indentation and style for DTS examples.
- Use unevalutateProperties for SPI devices instead of additionalProperties
to allow generic SPI properties from spi-peripheral-props.yaml
ABI Docs
- Add missing docs for sampling_frequency when it applies only to events.
Treewide
- Various minor tweaks, comment fixes and similar.
- Sort TI ADCs in Kconfig that had gotten out of order.
- Switch various drives that provide GPIO chip functionality to the new
callbacks with return values.
- Standardize on { } formatting for all array sentinels.
- Make use of aligned_s64 in a few places to replace either wrong types
or manually defined equivalents.
- Drop places where spi bits_per_word is set to 8 because that is the
default anyway.
adi,ad_sigma_delta library
- Avoid a potential use of uninitialized data if reg_size has a value
that is not supported (no drivers hit this but it is reasonable hardening)
adi,ad4030
- Add error checking for scan types and no longer store it in state.
- Rework code to reduce duplication.
- Move setting the mode from buffer preenable() to update_scan_mode(),
better matching expected semantics of the two different callbacks.
- Improve data marshalling comments.
adi,ad4695
- Use u16 for buffer elements as oversampling is not yet supported except
with SPI offload (which doesn't use this path).
adi,ad5592r
- Clean up destruction of mutexes.
- Use lock guards to simplify code (later patch fixes a missed unlock)
adi,ad5933
- Correct some incorrect settling times.
adi,ad7091
- Deduplicate handling of writable vs volatile registers as they are the
inverse of each other for this device.
adi,ad7124
- Fix 3db Filter frequency.
- Remove ability to directly write the filter frequency (which was broken)
- Register naming improvements.
adi,ad7606
- Add a missing return value check.
- Fill in max sampling rates for all chips.
- Use devm_mutex_init()
- Fix up some kernel-doc formatting issues.
- Remove some camel case that snuck in.
- Drop setting address field in channels as easily established from other
fields.
- Drop unnecessary parameter to ad76060_scale_setup_cb_t.
adi,ad7768-1
- Convert to regmap.
- Factor out buffer allocation.
- Tidy up headers.
adi,ad7944
- Stop setting bits_per_word in SPI xfers with no data.
adi,ad9832
- Add of_device_id table rather than just relying on fallbacks.
- Use FIELD_PREP() to set values of fields.
adi,admv1013
- Cleanup a pointless ternary.
adi,admv8818
- Fix up LPF Band 5 frequency which was slightly wrong.
- Fix an integer overflow.
- Fix range calculation
adi,adt7316
- Replace irqd_get_trigger_type(irq_get_irq_data()) with simpler
irq_get_trigger_type()
adi,adxl345
- Use regmap cache instead of various state variables that were there to
reduce bus accesses.
- Make regmap return value checking consistent across all call sites.
adi,axi-dac
- Add a check on number of channels (0 to 15 valid)
allwinner,sun20i
- Use new adc-helpers to replace local parsing code for channel nodes.
bosch,bmp290
- Move to local variables for sensor data marshalling removing the need
for a messy definition that has to work for all supported parts.
Follow up fix adds a missing initialization.
dynaimage,al3010 and dynaimage,al3320a
- Various minor cleanup to bring these drivers inline with reviewed feedback
given on a new driver.
- Fix an error path in which power down is not called when it should be.
- Switch to regmap.
google,cros_ec
- Fix up a flexible array in middle of structure warning.
- Flush fifo when changing the timeout to avoid potential long wait
for samples.
hid-sensor-rotation
- Remove an __aligned(16) marking that doesn't seem to be justified.
kionix,kxcjk-1013
- Deduplicate code for setting up interrupts.
microchip,mcp3911
- Fix handling of conversion results register which differs across supported
devices.
idt,zopt2201
- Avoid duplicating register lists as all volatile registers are the
inverse of writeable registers on this device.
renesas,rzg2l
- Use new adc-helpers to replace local parsing code for channel nodes.
ti,ads1298
- Fix a missing Kconfig dependency.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAmgt1EgRHGppYzIzQGtl
cm5lbC5vcmcACgkQVIU0mcT0Foii3g//bZJT2/rHGIkU/MKUsYQCZGh5ux7X59Kk
DT8R7i8RYg2dshbtkkHUn/S6yC3OX84q3Legzy8TtHkPvGMK1NSwWCrn2cqkCZdG
4kksfq3P41FmRrpSfm4p+WSAD9nYzA/LToSLdwGIOa44lKxlCElBgsIDz/HxwXLB
N08H5jniST+cmSWAkkxCj15jcKXGtuIUS3XSCffwlzQZNPHp7guk4u+5EiUaPwvy
X75iZ6SthUo9GfE6QyhE4rdOceix6S5wsQrKVpGKB7ZuQbXYQxlvNVt/l5p6MGdY
teoqB7CyUADD1yHZp9L85+JSqjbodxm5a/Y9b0wQKHV33dxphEVZeli9t+VZYg6K
fSM5xq0Cmt0msXLGRgB1uvZcEfj+vlkn7SN9nyyfTN2nXoac1E/MYCEiCLMhOj6D
jxwc1aQDwaT+f6U6mRyRab2+TPea7AOv81x0MyRclOArqY4NCOPIG+XvC2nouYXk
9MSyK4FIysbINjenFCZdpdibp3HKWGqc8rTv3NWrx7CejAYQoEkYW5WA9qyzy/I8
8gDbcQskcYcehrI8ztc1ratfKl5k0QJPCgcLvuTpxK3PkQCgX+dGnL1EXW7D4BRx
7Gl6O7gW7JR6kVx7LidJAxUSykoCdtPyiDxfEU8ypERiype2E6EJN/Bn11o1P8zr
VZ/PdGti5gg=
=vLKV
-----END PGP SIGNATURE-----
Merge tag 'iio-for-6.16a-take2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
Jonathan writes:
IIO: New device support, features and cleanup for 6.16 - take 2
Note - last minute rebase was to drop a typo patch that I'd accidentally
picked up (in the microblaze arch Kconfig)
Take 2 is due to that rebase messing up some fixes tags that were
referring to patches after that point.
There is a known merge conflict due to changes in neighbouring lines.
Stephen's resolution in linux-next is:
https://lore.kernel.org/linux-next/20250506155728.65605bae@canb.auug.org.au/
Added 3 named IIO reviewers to MAINTAINERS. This is a reflection of those
who have been doing much of this work for some time. Lars-Peter is
removed from the entry having moved on to other topics. Thanks
Nuno, David and Andy for stepping up and Lars-Peter for all your
hard work in the past!
Includes the usual mix of new device support, features and general
cleanup.
This time we also have some tree wide changes.
- Rip out the iio_device_claim_direct_scoped() as it proved hard to work
with. This series includes quite a few related cleanups such as use
of guard or factoring code out to allow direct returns.
- Switch from iio_device_claim/release_direct_mode() to new
iio_device_claim/release_direct() which is structured so that sparse
can warn on failed releases. There were a few false positives but
those were mostly in code that benefited from being cleaned up as part
of this process.
- Introduce iio_push_to_buffers_with_ts() to replace the _timestamp()
version over time. This version takes the size of the supplied buffer
which the core checks is at least as big as expected by calculation
from channel descriptions of those channels enabled. Use this in
an initial set of drivers.
- Add macros for IIO_DECLARE_BUFFER_WITH_TS() and
IIO_DECLARE_DMA_BUFFER_WITH_TS() to avoid lots of fiddly code to ensure
correctly aligned buffers for timestamps being added onto the end of
channel data.
New device support
------------------
adi,ad3530r
- New driver for AD3530, AD3530R, AD3531 and AD3531R DACs with
programmable gain controls. R variants have internal references.
adi,ad7476
- Add support (dt compatible only) for the Rohm BU79100G ADC which is
fully compatible with the ti,ads7866.
adi,ad7606
- Support ad7606c-16 and ad7606c-18 devices. Includes switch to dynamic
channel information allocation.
adi,ad7380
- Add support for the AD7389-4
dfrobot,sen0322
- New driver for this oxygen sensor.
mediatek,mt2701-auxadc
- Add binding for MT6893 which is fully compatible with already supported
MT8173.
meson-saradc
- Support the GXLX SoCs. Mostly this is a workaround for some unrelated
clock control bits found in the ADC register map.
nuvoton,nct7201
- New driver for NCT7201 and NCT7202 I2C ADCs.
rohm,bd79124
- New driver for this 12-bit, 8-channel SAR ADC.
- Switch to new set_rv etc gpio callbacks that were added in 6.15.
rohm,bd79703
- Add support for BD79700, BD79701 and BD79702 DACs that have subsets of
functionality of the already supported bd79703. Included making this
driver suitable for support device variants.
st,stm32-lptimer
- Add support for stm32pm25 to this trigger.
Features
--------
Beyond IIO
- Property iterator for named children.
core
- Enable writes for 64 bit integers used for standard IIO ABI elements.
Previously these could be read only.
- Helper library that should avoid code duplication for simpler ADC
bindings that have a child node per channel.
- Enforce that IIO_DMA_MINALIGN is always at least 8 (almost always true
and simplifies code on all significant architectures)
core/backend
- Add support to control source of data - useful when the HDL includes
things like generated ramps for testing purposes. Enable this for
adi-axi-dac
adi,ad3552-hs
- Add debugfs related callbacks to allow debug access to register contents.
adi,ad4000
- Support SPI offload with appropriate FPGA firmware along with improving
documentation.
adi,ad7293
- Add support for external reference voltage.
adi,ad7606
- Support SPI offload.
adi,ad7768-1
- Support reset GPIO.
adi,admv8818
- Support filter frequencies beyond 2^32.
adi,adxl345
- Add single and double tap events.
hid-sensor-prox
- Support 16-bit report sizes as seen on some Intel platforms.
invensense,icm42600
- Enable use of named interrupts to avoid problems with some wiring choices.
Get the interrupt by name, but fallback to previous assumption on the first
being INT1 if no names are supplied.
microchip,mcp3911
- Add reset gpio support.
rohm,bh7150
- Add reset gpio support.
st,stm32
- Add support to control oversampling.
ti,adc128s052
- Add support for ROHM BD79104 which is early compatible with the TI
parts already supported by this driver. Includes some general driver
cleanup and a separate dt binding.
- Simplify reference voltage handling by assuming it is fixed after enabling
the supply.
winsen,mhz19b
- New driver for this C02 sensor.
Cleanup and minor fixes
-----------------------
dt-bindings
- Correct indentation and style for DTS examples.
- Use unevalutateProperties for SPI devices instead of additionalProperties
to allow generic SPI properties from spi-peripheral-props.yaml
ABI Docs
- Add missing docs for sampling_frequency when it applies only to events.
Treewide
- Various minor tweaks, comment fixes and similar.
- Sort TI ADCs in Kconfig that had gotten out of order.
- Switch various drives that provide GPIO chip functionality to the new
callbacks with return values.
- Standardize on { } formatting for all array sentinels.
- Make use of aligned_s64 in a few places to replace either wrong types
or manually defined equivalents.
- Drop places where spi bits_per_word is set to 8 because that is the
default anyway.
adi,ad_sigma_delta library
- Avoid a potential use of uninitialized data if reg_size has a value
that is not supported (no drivers hit this but it is reasonable hardening)
adi,ad4030
- Add error checking for scan types and no longer store it in state.
- Rework code to reduce duplication.
- Move setting the mode from buffer preenable() to update_scan_mode(),
better matching expected semantics of the two different callbacks.
- Improve data marshalling comments.
adi,ad4695
- Use u16 for buffer elements as oversampling is not yet supported except
with SPI offload (which doesn't use this path).
adi,ad5592r
- Clean up destruction of mutexes.
- Use lock guards to simplify code (later patch fixes a missed unlock)
adi,ad5933
- Correct some incorrect settling times.
adi,ad7091
- Deduplicate handling of writable vs volatile registers as they are the
inverse of each other for this device.
adi,ad7124
- Fix 3db Filter frequency.
- Remove ability to directly write the filter frequency (which was broken)
- Register naming improvements.
adi,ad7606
- Add a missing return value check.
- Fill in max sampling rates for all chips.
- Use devm_mutex_init()
- Fix up some kernel-doc formatting issues.
- Remove some camel case that snuck in.
- Drop setting address field in channels as easily established from other
fields.
- Drop unnecessary parameter to ad76060_scale_setup_cb_t.
adi,ad7768-1
- Convert to regmap.
- Factor out buffer allocation.
- Tidy up headers.
adi,ad7944
- Stop setting bits_per_word in SPI xfers with no data.
adi,ad9832
- Add of_device_id table rather than just relying on fallbacks.
- Use FIELD_PREP() to set values of fields.
adi,admv1013
- Cleanup a pointless ternary.
adi,admv8818
- Fix up LPF Band 5 frequency which was slightly wrong.
- Fix an integer overflow.
- Fix range calculation
adi,adt7316
- Replace irqd_get_trigger_type(irq_get_irq_data()) with simpler
irq_get_trigger_type()
adi,adxl345
- Use regmap cache instead of various state variables that were there to
reduce bus accesses.
- Make regmap return value checking consistent across all call sites.
adi,axi-dac
- Add a check on number of channels (0 to 15 valid)
allwinner,sun20i
- Use new adc-helpers to replace local parsing code for channel nodes.
bosch,bmp290
- Move to local variables for sensor data marshalling removing the need
for a messy definition that has to work for all supported parts.
Follow up fix adds a missing initialization.
dynaimage,al3010 and dynaimage,al3320a
- Various minor cleanup to bring these drivers inline with reviewed feedback
given on a new driver.
- Fix an error path in which power down is not called when it should be.
- Switch to regmap.
google,cros_ec
- Fix up a flexible array in middle of structure warning.
- Flush fifo when changing the timeout to avoid potential long wait
for samples.
hid-sensor-rotation
- Remove an __aligned(16) marking that doesn't seem to be justified.
kionix,kxcjk-1013
- Deduplicate code for setting up interrupts.
microchip,mcp3911
- Fix handling of conversion results register which differs across supported
devices.
idt,zopt2201
- Avoid duplicating register lists as all volatile registers are the
inverse of writeable registers on this device.
renesas,rzg2l
- Use new adc-helpers to replace local parsing code for channel nodes.
ti,ads1298
- Fix a missing Kconfig dependency.
* tag 'iio-for-6.16a-take2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (260 commits)
dt-bindings: iio: adc: Add ROHM BD79100G
iio: adc: add support for Nuvoton NCT7201
dt-bindings: iio: adc: add NCT7201 ADCs
iio: chemical: Add driver for SEN0322
dt-bindings: trivial-devices: Document SEN0322
iio: adc: ad7768-1: reorganize driver headers
iio: bmp280: zero-init buffer
iio: ssp_sensors: optimalize -> optimize
HID: sensor-hub: Fix typo and improve documentation
iio: admv1013: replace redundant ternary operator with just len
iio: chemical: mhz19b: Fix error code in probe()
iio: adc: at91-sama5d2: use IIO_DECLARE_BUFFER_WITH_TS
iio: accel: sca3300: use IIO_DECLARE_BUFFER_WITH_TS
iio: adc: ad7380: use IIO_DECLARE_DMA_BUFFER_WITH_TS
iio: adc: ad4695: rename AD4695_MAX_VIN_CHANNELS
iio: adc: ad4695: use IIO_DECLARE_DMA_BUFFER_WITH_TS
iio: introduce IIO_DECLARE_BUFFER_WITH_TS macros
iio: make IIO_DMA_MINALIGN minimum of 8 bytes
iio: pressure: zpa2326_spi: remove bits_per_word = 8
iio: pressure: ms5611_spi: remove bits_per_word = 8
...
Some GPIO chips allow to rise an IRQ on GPIO level changes but do not
provide an IRQ status for each separate line: only the current gpio
level can be retrieved.
Add support for these chips, emulating IRQ status by comparing GPIO
levels with the levels during the previous interrupt.
Signed-off-by: Mathieu Dubois-Briand <mathieu.dubois-briand@bootlin.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20250522-mdb-max7360-support-v9-5-74fc03517e41@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
On machines with multiple memory nodes, interleaving page allocations
across nodes allows for better utilization of each node's bandwidth.
Previous work by Gregory Price [1] introduced weighted interleave, which
allowed for pages to be allocated across nodes according to user-set
ratios.
Ideally, these weights should be proportional to their bandwidth, so that
under bandwidth pressure, each node uses its maximal efficient bandwidth
and prevents latency from increasing exponentially.
Previously, weighted interleave's default weights were just 1s -- which
would be equivalent to the (unweighted) interleave mempolicy, which goes
through the nodes in a round-robin fashion, ignoring bandwidth
information.
This patch has two main goals: First, it makes weighted interleave easier
to use for users who wish to relieve bandwidth pressure when using nodes
with varying bandwidth (CXL). By providing a set of "real" default
weights that just work out of the box, users who might not have the
capability (or wish to) perform experimentation to find the most optimal
weights for their system can still take advantage of bandwidth-informed
weighted interleave.
Second, it allows for weighted interleave to dynamically adjust to
hotplugged memory with new bandwidth information. Instead of manually
updating node weights every time new bandwidth information is reported or
taken off, weighted interleave adjusts and provides a new set of default
weights for weighted interleave to use when there is a change in bandwidth
information.
To meet these goals, this patch introduces an auto-configuration mode for
the interleave weights that provides a reasonable set of default weights,
calculated using bandwidth data reported by the system. In auto mode,
weights are dynamically adjusted based on whatever the current bandwidth
information reports (and responds to hotplug events).
This patch still supports users manually writing weights into the nodeN
sysfs interface by entering into manual mode. When a user enters manual
mode, the system stops dynamically updating any of the node weights, even
during hotplug events that shift the optimal weight distribution.
A new sysfs interface "auto" is introduced, which allows users to switch
between the auto (writing 1 or Y) and manual (writing 0 or N) modes. The
system also automatically enters manual mode when a nodeN interface is
manually written to.
There is one functional change that this patch makes to the existing
weighted_interleave ABI: previously, writing 0 directly to a nodeN
interface was said to reset the weight to the system default. Before this
patch, the default for all weights were 1, which meant that writing 0 and
1 were functionally equivalent. With this patch, writing 0 is invalid.
Link: https://lkml.kernel.org/r/20250520141236.2987309-1-joshua.hahnjy@gmail.com
[joshua.hahnjy@gmail.com: wordsmithing changes, simplification, fixes]
Link: https://lkml.kernel.org/r/20250511025840.2410154-1-joshua.hahnjy@gmail.com
[joshua.hahnjy@gmail.com: remove auto_kobj_attr field from struct sysfs_wi_group]
Link: https://lkml.kernel.org/r/20250512142511.3959833-1-joshua.hahnjy@gmail.comhttps://lore.kernel.org/linux-mm/20240202170238.90004-1-gregory.price@memverge.com/ [1]
Link: https://lkml.kernel.org/r/20250505182328.4148265-1-joshua.hahnjy@gmail.com
Co-developed-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Suggested-by: Yunjeong Mun <yunjeong.mun@sk.com>
Suggested-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: Ying Huang <ying.huang@linux.alibaba.com>
Suggested-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>
Reviewed-by: Honggyu Kim <honggyu.kim@sk.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use enable_irq_wake() and disable_irq_wake() instead of
calling low-level irq_set_irq_wake() with a parameter.
No functional changes.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20250521135538.1086717-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
If either REGMAP_IRQ or REGMAP_MDIO are set then REGMAP is also set.
This then enables the selecting of IRQ_DOMAIN or MDIO_BUS from REGMAP
based on the above two symbols respectively. This makes it very easy
to end up with "circular dependencies".
Instead select the IRQ_DOMAIN or MDIO_BUS from the symbols that make
use of them. This is almost equivalent to before but makes it less
likely to end up with false circular dependency detections.
Signed-off-by: Andrew Davis <afd@ti.com>
Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Closes: https://lore.kernel.org/r/bfe991fa-f54c-4d58-b2e0-34c4e4eb48f4@linaro.org/
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250516141722.13772-1-afd@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
pm_runtime_put_autosuspend() schedules a hrtimer to expire
at "dev->power.timer_expires". If the hrtimer's callback,
pm_suspend_timer_fn(), observes that the current time equals
"dev->power.timer_expires", it unexpectedly bails out instead of
proceeding with runtime suspend.
pm_suspend_timer_fn():
if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
dev->power.timer_expires = 0;
rpm_suspend(..)
}
Additionally, as ->timer_expires is not cleared, all the future auto
suspend requests will not schedule hrtimer to perform auto suspend.
rpm_suspend():
if ((rpmflags & RPM_AUTO) &&...) {
if (!(dev->power.timer_expires && ...) { <-- this will fail.
hrtimer_start_range_ns(&dev->power.suspend_timer,...);
}
}
Fix this by as well checking if current time reaches the set expiration.
Co-developed-by: Patrick Daly <quic_pdaly@quicinc.com>
Signed-off-by: Patrick Daly <quic_pdaly@quicinc.com>
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Link: https://patch.msgid.link/20250515064125.1211561-1-quic_charante@quicinc.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The "suspend in progress" check in device_wakeup_enable() does not
cover hibernation, but arguably it should do that, so introduce
pm_sleep_transition_in_progress() covering transitions during both
system suspend and hibernation to use in there and use it also in
pm_debug_messages_should_print().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/7820474.EvYhyI6sBW@rjwysocki.net
[ rjw: Move the new function definition under CONFIG_PM_SLEEP ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Introduce pm_suspend_in_progress() to be used for checking if a system-
wide suspend or resume transition is in progress, instead of comparing
pm_suspend_target_state directly to PM_SUSPEND_ON, and use it where
applicable.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/2020901.PYKUYFuaPT@rjwysocki.net
Prepare to resolve conflicts with an upstream series of fixes that conflict
with pending x86 changes:
6f5bf947ba Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge series from Bence Csókás <csokas.bence@prolan.hu>:
The probe() function of the atmel-quadspi driver got quite convoluted,
especially since the addition of SAMA7G5 support, that was forward-ported
from an older vendor kernel. During the port, a bug was introduced, where
the PM get() and put() calls were imbalanced. To alleivate this - and
similar problems in the future - an effort was made to migrate as many
functions as possible, to their devm_ managed counterparts. The few
functions, which did not yet have a devm_ variant, are added in patch 1 of
this series. Patch 2 then uses these APIs to fix the probe() function.
Patch series "memory,x86,acpi: hotplug memory alignment advisement", v8.
When physical address regions are not aligned to memory block size, the
misaligned portion is lost (stranded capacity).
Block size (min/max/selected) is architecture defined. Most architectures
tend to use the minimum block size or some simplistic heurist. On x86,
memory block size increases up to 2GB, and is otherwise fitted to the
alignment of non-hotplug (i.e. not special purpose memory).
CXL exposes its memory for management through the ACPI CEDT (CXL Early
Detection Table) in a field called the CXL Fixed Memory Window. Per the
CXL specification, this memory must be aligned to at least 256MB.
When a CFMW aligns on a size less than the block size, this causes a loss
of up to 2GB per CFMW on x86. It is not uncommon for CFMW to be allocated
per-device - though this behavior is BIOS defined.
This patch set provides 3 things:
1) implement advise/query functions in driverse/base/memory.c to
report/query architecture agnostic hotplug block alignment advice.
2) update x86 memblock size logic to consider the hotplug advice
3) add code in acpi/numa/srat.c to report CFMW alignment advice
The advisement interfaces are design to be called during arch_init code
prior to allocator and smp_init. start_kernel will call these through
setup_arch() (via acpi and mm/init_64.c on x86), which occurs prior to
mm_core_init and smp_init - so no need for atomics.
There's an attempt to signal callers to advise() that query has already
occurred, but this is predicated on the notion that query actually occurs
(which presently only happens on the x86 arch). This is to assist
debugging future users. Otherwise, the advise() call has been marked
__init to help static discovery of bad call times.
Once query is called the first time, it will always return the same value.
Interfaces return -EBUSY and 0 respectively on systems without hotplug.
This patch (of 3):
Hotplug memory sources may have opinions on what the memblock size should
be - usually for alignment purposes. For example, CXL memory extents can
be 256MB with a matching alignment. If this size/alignment is smaller
than the block size, it can result in stranded capacity.
Implement memory_block_advise_max_size for use prior to allocator init,
for software to advise the system on the max block size.
Implement memory_block_probe_max_size for use by arch init code to
calculate the best block size. Use of advice is architecture defined.
The probe value can never change after first probe. Calls to advise after
probe will return -EBUSY to aid debugging.
On systems without hotplug, always return -ENODEV and 0 respectively.
Link: https://lkml.kernel.org/r/20250127153405.3379117-1-gourry@gourry.net
Link: https://lkml.kernel.org/r/20250127153405.3379117-2-gourry@gourry.net
Signed-off-by: Gregory Price <gourry@gourry.net>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Bruno Faccini <bfaccini@nvidia.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haibo Xu <haibo1.xu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Robert Richter <rrichter@amd.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmgebIwACgkQaDWVMHDJ
krCGSA/+I+W/uqiz58Z2Zu4RrXMYFfKJxacF7My9wnOyRxaJduS3qrz1E5wHqBId
f6M8wDx9nS24UxDkBbi84NdtlG1zj8nV8djtszGKVeqHG2DcQMMOXBKZSjOmTo2b
GIZ3a3xEqXaFfnGQxXSZrvtHIwCmv10H2oyGHu0vBp/SJuWXNg72oivOGhbm0uWs
0/bdIK8+1sW7OAmhhKdvMVpmzL8TQJnkUHSkQilPB2Tsf9wWDfeY7kDkK5YwQpk2
ZK+hrmwCFXQZELY65F2+y/cFim/F38HiqVdvIkV1wFSVqVVE9hEKJ4BDZl1fXZKB
p4qpDFgxO27E/eMo9IZfxRH4TdSoK6YLWo9FGWHKBPnciJfAeO9EP/AwAIhEQRdx
YZlN9sGS6ja7O1Eh423BBw6cFj6ta0ck2T1PoYk32FXc6sgqCphsfvBD3+tJxz8/
xoZ3BzoErdPqSXbH5cSI972kQW0JLESiMTZa827qnJtT672t6uBcsnnmR0ZbJH1f
TJCC9qgwpBiEkiGW3gwv00SC7CkXo3o0FJw0pa3MkKHGd7csxBtGBHI1b6Jj+oB0
yWf1HxSqwrq2Yek8R7lWd4jIxyWfKriEMTu7xCMUUFlprKmR2RufsADvqclNyedQ
sGBCc4eu1cpZp2no/IFm+IvkuzUHnkS/WNL1LbZ9YI8h8unjZHE=
=UVgZ
-----END PGP SIGNATURE-----
Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 ITS mitigation from Dave Hansen:
"Mitigate Indirect Target Selection (ITS) issue.
I'd describe this one as a good old CPU bug where the behavior is
_obviously_ wrong, but since it just results in bad predictions it
wasn't wrong enough to notice. Well, the researchers noticed and also
realized that thus bug undermined a bunch of existing indirect branch
mitigations.
Thus the unusually wide impact on this one. Details:
ITS is a bug in some Intel CPUs that affects indirect branches
including RETs in the first half of a cacheline. Due to ITS such
branches may get wrongly predicted to a target of (direct or indirect)
branch that is located in the second half of a cacheline. Researchers
at VUSec found this behavior and reported to Intel.
Affected processors:
- Cascade Lake, Cooper Lake, Whiskey Lake V, Coffee Lake R, Comet
Lake, Ice Lake, Tiger Lake and Rocket Lake.
Scope of impact:
- Guest/host isolation:
When eIBRS is used for guest/host isolation, the indirect branches
in the VMM may still be predicted with targets corresponding to
direct branches in the guest.
- Intra-mode using cBPF:
cBPF can be used to poison the branch history to exploit ITS.
Realigning the indirect branches and RETs mitigates this attack
vector.
- User/kernel:
With eIBRS enabled user/kernel isolation is *not* impacted by ITS.
- Indirect Branch Prediction Barrier (IBPB):
Due to this bug indirect branches may be predicted with targets
corresponding to direct branches which were executed prior to IBPB.
This will be fixed in the microcode.
Mitigation:
As indirect branches in the first half of cacheline are affected, the
mitigation is to replace those indirect branches with a call to thunk that
is aligned to the second half of the cacheline.
RETs that take prediction from RSB are not affected, but they may be
affected by RSB-underflow condition. So, RETs in the first half of
cacheline are also patched to a return thunk that executes the RET aligned
to second half of cacheline"
* tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftest/x86/bugs: Add selftests for ITS
x86/its: FineIBT-paranoid vs ITS
x86/its: Use dynamic thunks for indirect branches
x86/ibt: Keep IBT disabled during alternative patching
mm/execmem: Unify early execmem_cache behaviour
x86/its: Align RETs in BHB clear sequence to avoid thunking
x86/its: Add support for RSB stuffing mitigation
x86/its: Add "vmexit" option to skip mitigation on some CPUs
x86/its: Enable Indirect Target Selection mitigation
x86/its: Add support for ITS-safe return thunk
x86/its: Add support for ITS-safe indirect thunk
x86/its: Enumerate Indirect Target Selection (ITS) bug
Documentation: x86/bugs/its: Add ITS documentation
Here is a single driver core fix for a regression for platform devices
that is a regression from a change that went into 6.15-rc1 that affected
Pixel devices. It has been in linux-next for over a week with no
reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaB9XmA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykMpgCgkKADII5bTybOB2Rkd2XRv1gh9ggAnjnoZZNT
WdtawFbS8G6EldLgH9lN
=gJoe
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fix from Greg KH:
"Here is a single driver core fix for a regression for platform devices
that is a regression from a change that went into 6.15-rc1 that
affected Pixel devices. It has been in linux-next for over a week with
no reported problems"
* tag 'driver-core-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
platform: Fix race condition during DMA configure at IOMMU probe time
Indirect Target Selection (ITS) is a bug in some pre-ADL Intel CPUs with
eIBRS. It affects prediction of indirect branch and RETs in the
lower half of cacheline. Due to ITS such branches may get wrongly predicted
to a target of (direct or indirect) branch that is located in the upper
half of the cacheline.
Scope of impact
===============
Guest/host isolation
--------------------
When eIBRS is used for guest/host isolation, the indirect branches in the
VMM may still be predicted with targets corresponding to branches in the
guest.
Intra-mode
----------
cBPF or other native gadgets can be used for intra-mode training and
disclosure using ITS.
User/kernel isolation
---------------------
When eIBRS is enabled user/kernel isolation is not impacted.
Indirect Branch Prediction Barrier (IBPB)
-----------------------------------------
After an IBPB, indirect branches may be predicted with targets
corresponding to direct branches which were executed prior to IBPB. This is
mitigated by a microcode update.
Add cmdline parameter indirect_target_selection=off|on|force to control the
mitigation to relocate the affected branches to an ITS-safe thunk i.e.
located in the upper half of cacheline. Also add the sysfs reporting.
When retpoline mitigation is deployed, ITS safe-thunks are not needed,
because retpoline sequence is already ITS-safe. Similarly, when call depth
tracking (CDT) mitigation is deployed (retbleed=stuff), ITS safe return
thunk is not used, as CDT prevents RSB-underflow.
To not overcomplicate things, ITS mitigation is not supported with
spectre-v2 lfence;jmp mitigation. Moreover, it is less practical to deploy
lfence;jmp mitigation on ITS affected parts anyways.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Some of the debug sysfs attributes for runtime PM are located
in the power_attrs[] table, so they are exposed even in the
pm_runtime_has_no_callbacks() case, unlike the other non-debug
sysfs attributes for runtime PM, which may be confusing.
Moreover, dev_attr_runtime_status.attr appears in two
places, which effectively causes it to be always exposed if
CONFIG_PM_ADVANCED_DEBUG is set, but otherwise it is exposed
only when pm_runtime_has_no_callbacks() returns 'false'.
Address this by putting all sysfs attributes for runtime PM into
runtime_attrs[].
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/12677254.O9o76ZdvQC@rjwysocki.net
There is wakeup source attribute 'active_count', but its counterpart
attribute 'relax_count' is missing.
Add 'relax_count' for consistency.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://patch.msgid.link/20250505-add_power_attrs-v1-1-10bc3c73c320@quicinc.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
arch_topology.c provides functionality to parse and scale CPU capacity.
It also provides a corresponding sysfs interface. Some architectures
parse and scale CPU capacity differently as per their own needs. On
Intel processors, for instance, it is responsibility of the Intel
P-state driver.
Relocate the implementation of that interface to a common location in
topology.c. Architectures can use the interface and populate it using
their own mechanisms.
An alternative approach would be to compile arch_topology.c even if
not needed only to get this interface. This approach would create
duplicated and conflicting functionality and data structures.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20250419025504.9760-2-ricardo.neri-calderon@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
platform_device_msi_init_and_alloc_irqs() performs two tasks: allocating
the MSI domain for a platform device, and allocate a number of MSIs in that
domain.
platform_device_msi_free_irqs_all() only frees the MSIs, and leaves the MSI
domain alive.
Given that platform_device_msi_init_and_alloc_irqs() is the sole tool a
platform device has to allocate platform MSIs, it makes sense for
platform_device_msi_free_irqs_all() to teardown the MSI domain at the same
time as the MSIs.
This avoids warnings and unexpected behaviours when a driver repeatedly
allocates and frees MSIs.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-1-f69b49917464@nxp.com
The stat is always 0 now, so remove it and hardwire the user visible
output to 0.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20250505081138.3435992-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
To avoid a race between the IOMMU probing thread and the device driver
async probing thread during configuration of the platform DMA, update
`platform_dma_configure()` to read `dev->driver` once and test if it's
NULL before using it. This ensures that we don't de-reference an invalid
platform driver pointer if the device driver is asynchronously bound
while configuring the DMA.
Fixes: bcb81ac6ae ("iommu: Get DT/ACPI parsing into the proper probe path")
Signed-off-by: Will McVicker <willmcvicker@google.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20250424180420.3928523-1-willmcvicker@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This user of SHA-256 does not support any other algorithm, so the
crypto_shash abstraction provides no value. Just use the SHA-256
library API instead, which is much simpler and easier to use.
Also take advantage of printk's built-in hex conversion using %*phN.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250428190909.852705-1-ebiggers@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
A single series is present to properly handle the module_kobject creation.
It fixes a problem with missing /sys/module/<module>/drivers for built-in
modules.
The fix has been on linux-next for two weeks with no reported issues.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEIduBR9MnFA82q/jtumpXJwqY6poFAmgSH2YUHHBldHIucGF2
bHVAc3VzZS5jb20ACgkQumpXJwqY6poL/wf/TZEux9aieu8VOhPbV1Mo1npVAeT7
MJ5R4M6QKxPNvvBiXK5lWSVy5IPtcuwkbyEfKxV/CS668FwJeFpGFb91rRY108He
EUHjj5NtZ1WhEHFRBgJPLydGZGGtJzxy3yg26x6wO58VJrIx/H3HU3jgsnj1m32a
fA1cbo4Yo9gnk0YzI2KDu6A+bXi8zJVpyYDU9Ir4mdy+CVd5+vN9WypzrjHXbMya
2xhd9768sVmShY9K5+DlOXF4stVsP6CbgWGxhwIbdfLvY977QaBhrr+emrPE3uYt
5g+rg3v7ciuW14D5rLPWqZ5aXinjNt4vc7maNA9sJLW5wLOiWGXjhseUFg==
=6rT4
-----END PGP SIGNATURE-----
Merge tag 'modules-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux
Pull modules fixes from Petr Pavlu:
"A single series to properly handle the module_kobject creation.
This fixes a problem with missing /sys/module/<module>/drivers for
built-in modules"
* tag 'modules-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
drivers: base: handle module_kobject creation
kernel: globalize lookup_or_create_module_kobject()
kernel: refactor lookup_or_create_module_kobject()
kernel: param: rename locate_module_kobject
Add `devm_pm_runtime_set_active_enabled()` and
`devm_pm_runtime_get_noresume()` for simplifying
common cases in drivers.
Signed-off-by: Bence Csókás <csokas.bence@prolan.hu>
Link: https://patch.msgid.link/20250327195928.680771-3-csokas.bence@prolan.hu
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgOrWseHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFyIH/AhXcuA8y8rk43mo
t+0GO7JR4dnr4DIl74GgDjCXlXiKCT7EXMfD/ABdofTxV4Pbyv+pUODlg1E6eO9U
C1WWM5PPNBGDDEVSQ3Yu756nr0UoiFhvW0R6pVdou5cezCWAtIF9LTN8DEUgis0u
EUJD9+/cHAMzfkZwabjm/HNsa1SXv2X47MzYv/PdHKr0htEPcNHF4gqBrBRdACGy
FJtaCKhuPf6TcDNXOFi5IEWMXrugReRQmOvrXqVYGa7rfUFkZgsAzRY6n/rUN5Z9
FAgle4Vlv9ohVYj9bXX8b6wWgqiKRpoN+t0PpRd6G6ict1AFBobNGo8LH3tYIKqZ
b/dCGNg=
=xDGd
-----END PGP SIGNATURE-----
Merge 6.15-rc4 into driver-core-next
We need the driver core fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In some code we would like to know if the action in device managed resources
was added by devm_add_action() family of calls. Introduce a helper for that.
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Zijun Hu <quic_zijuhu@quicinc.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250220162238.2738038-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaAvyTwAKCRCRxhvAZXjc
okgIAQDlcp1CoIBOxNaqg9PzuWjg4uaNeiKqDVliy83bu8zGhAEAmwLSVEVJGvJl
ACFkKgc7+qaa3p67UmXf2HVrIg79+ww=
=o5Iw
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.15-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- For some reason we went from zero to three maintainers for HFS/HFS+
in a matter of days. The lesson to learn from this might just be that
we need to threaten code removal more often!?
- Fix a regression introduced by enabling large folios for lage logical
block sizes. This has caused issues for noref migration with large
folios due to sleeping while in an atomic context.
New sleeping variants of pagecache lookup helpers are introduced.
These helpers take the folio lock instead of the mapping's private
spinlock. The problematic users are converted to the sleeping
variants and serialize against noref migration. Atomic users will
bail on seeing the new BH_Migrate flag.
This also shrinks the critical region of the mapping's private lock
and the new blocking callers reduce contention on the spinlock for
bdev mappings.
- Fix two bugs in do_move_mount() when with MOVE_MOUNT_BENEATH. The
first bug is using a mountpoint that is located on a mount we're not
holding a reference to. The second bug is putting the mountpoint
after we've called namespace_unlock() as it's no longer guaranteed
that it does stay a mountpoint.
- Remove a pointless call to vfs_getattr_nosec() in the devtmpfs code
just to query i_mode instead of simply querying the inode directly.
This also avoids lifetime issues for the dm code by an earlier bugfix
this cycle that moved bdev_statx() handling into vfs_getattr_nosec().
- Fix AT_FDCWD handling with getname_maybe_null() in the xattr code.
- Fix a performance regression for files when multiple callers issue a
close when it's not the last reference.
- Remove a duplicate noinline annotation from pipe_clear_nowait().
* tag 'vfs-6.15-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs/xattr: Fix handling of AT_FDCWD in setxattrat(2) and getxattrat(2)
MAINTAINERS: hfs/hfsplus: add myself as maintainer
splice: remove duplicate noinline from pipe_clear_nowait
devtmpfs: don't use vfs_getattr_nosec to query i_mode
fix a couple of races in MNT_TREE_BENEATH handling by do_move_mount()
fs: fall back to file_ref_put() for non-last reference
mm/migrate: fix sleep in atomic for large folios and buffer heads
fs/ext4: use sleeping version of sb_find_get_block()
fs/jbd2: use sleeping version of __find_get_block()
fs/ocfs2: use sleeping version of __find_get_block()
fs/buffer: use sleeping version of __find_get_block()
fs/buffer: introduce sleeping flavors for pagecache lookups
MAINTAINERS: add HFS/HFS+ maintainers
fs/buffer: split locking for pagecache lookups
The following 4 APIs are only used by drivers/base/power/wakeup.c
internally.
- wakeup_source_create()
- wakeup_source_destroy()
- wakeup_source_add()
- wakeup_source_remove()
Do not expose them by making them as static functions.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20250420-fix_power-v2-1-9b938d2283aa@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This code was originally going to use error pointers but we decided it
should return NULL instead. The error pointer code in
__devm_auxiliary_device_create() was left over from the first version.
Update it to use NULL. No callers have been merged yet, so that makes
this change simple and self contained.
Fixes: eaa0d30216 ("driver core: auxiliary bus: add device creation helpers")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Reviewed-by: Leon Romanovsky <leon@kernel.org>
Link: https://lore.kernel.org/r/aAi7Kg3aTguFD0fU@stanley.mountain
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The recent move of the bdev_statx call to the low-level vfs_getattr_nosec
helper caused it being used by devtmpfs, which leads to deadlocks in
md teardown due to the block device lookup and put interfering with the
unusual lifetime rules in md.
But as handle_remove only works on inodes created and owned by devtmpfs
itself there is no need to use vfs_getattr_nosec vs simply reading the
mode from the inode directly. Switch to that to avoid the bdev lookup
or any other unintentional side effect.
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reported-by: Xiao Ni <xni@redhat.com>
Fixes: 777d0961ff ("fs: move the bdex_statx call to vfs_getattr_nosec")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250423045941.1667425-1-hch@lst.de
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Xiao Ni <xni@redhat.com>
Tested-by: Ayush Jain <Ayush.jain3@amd.com>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
There are a few use-cases where child nodes with a specific name need to
be parsed. Code like:
fwnode_for_each_child_node()
if (fwnode_name_eq())
...
can be found from a various drivers/subsystems. Adding a macro for this
can simplify things a bit.
In a few cases the data from the found nodes is later added to an array,
which is allocated based on the number of found nodes. One example of
such use is the IIO subsystem's ADC channel nodes, where the relevant
nodes are named as channel[@N].
Add helpers for iterating and counting device's sub-nodes with certain
name instead of open-coding this in every user.
Suggested-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Marcelo Schmitt <marcelo.schmitt1@gmail.com>
Link: https://patch.msgid.link/2767173b7b18e974c0bac244688214bd3863ff06.1742560649.git.mazziesaccount@gmail.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
In analogy with previous changes, make device_suspend_late() and
device_suspend_noirq() start the async suspend of the device's parent
after the device itself has been processed and make dpm_suspend_late()
and dpm_noirq_suspend_devices() start processing "async" leaf devices
(that is, devices without children) upfront so they don't need to wait
for the other devices they don't depend on.
This change reduces the total duration of device suspend on some systems
measurably, but not significantly.
Suggested-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1924195.CQOukoFCf9@rjwysocki.net
In analogy with the previous change affecting the resume path,
make device_suspend() start the async suspend of the device's parent
after the device itself has been processed and make dpm_suspend() start
processing "async" leaf devices (that is, devices without children)
upfront so they don't need to wait for the "sync" devices they don't
depend on.
On the Dell XPS13 9360 in my office, this change reduces the total
duration of device suspend by approximately 100 ms (over 20%).
Suggested-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3541233.QJadu78ljV@rjwysocki.net
According to [1], the handling of device suspend and resume, and
particularly the latter, involves unnecessary overhead related to
starting new async work items for devices that cannot make progress
right away because they have to wait for other devices.
To reduce this problem in the resume path, use the observation that
starting the async resume of the children of a device after resuming
the parent is likely to produce less scheduling and memory management
noise than starting it upfront while at the same time it should not
increase the resume duration substantially.
Accordingly, modify the code to start the async resume of the device's
children when the processing of the parent has been completed in each
stage of device resume and only start async resume upfront for devices
without parents.
Also make it check if a given device can be resumed asynchronously
before starting the synchronous resume of it in case it will have to
wait for another that is already resuming asynchronously.
In addition to making the async resume of devices more friendly to
systems with relatively less computing resources, this change is also
preliminary for analogous changes in the suspend path.
On the systems where it has been tested, this change by itself does
not affect the overall system resume duration in a measurable way.
Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-1-saravanak@google.com/ [1]
Suggested-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/22630663.EfDdHjke4D@rjwysocki.net
Old microcode is bad for users and for kernel developers.
For users, it exposes them to known fixed security and/or functional
issues. These obviously rarely result in instant dumpster fires in
every environment. But it is as important to keep your microcode up
to date as it is to keep your kernel up to date.
Old microcode also makes kernels harder to debug. A developer looking
at an oops need to consider kernel bugs, known CPU issues and unknown
CPU issues as possible causes. If they know the microcode is up to
date, they can mostly eliminate known CPU issues as the cause.
Make it easier to tell if CPU microcode is out of date. Add a list
of released microcode. If the loaded microcode is older than the
release, tell users in a place that folks can find it:
/sys/devices/system/cpu/vulnerabilities/old_microcode
Tell kernel kernel developers about it with the existing taint
flag:
TAINT_CPU_OUT_OF_SPEC
== Discussion ==
When a user reports a potential kernel issue, it is very common
to ask them to reproduce the issue on mainline. Running mainline,
they will (independently from the distro) acquire a more up-to-date
microcode version list. If their microcode is old, they will
get a warning about the taint and kernel developers can take that
into consideration when debugging.
Just like any other entry in "vulnerabilities/", users are free to
make their own assessment of their exposure.
== Microcode Revision Discussion ==
The microcode versions in the table were generated from the Intel
microcode git repo:
8ac9378a8487 ("microcode-20241112 Release")
which as of this writing lags behind the latest microcode-20250211.
It can be argued that the versions that the kernel picks to call "old"
should be a revision or two old. Which specific version is picked is
less important to me than picking *a* version and enforcing it.
This repository contains only microcode versions that Intel has deemed
to be OS-loadable. It is quite possible that the BIOS has loaded a
newer microcode than the latest in this repo. If this happens, the
system is considered to have new microcode, not old.
Specifically, the sysfs file and taint flag answer the question:
Is the CPU running on the latest OS-loadable microcode,
or something even later that the BIOS loaded?
In other words, Intel never publishes an authoritative list of CPUs
and latest microcode revisions. Until it does, this is the best that
Linux can do.
Also note that the "intel-ucode-defs.h" file is simple, ugly and
has lots of magic numbers. That's on purpose and should allow a
single file to be shared across lots of stable kernel regardless of if
they have the new "VFM" infrastructure or not. It was generated with
a dumb script.
== FAQ ==
Q: Does this tell me if my system is secure or insecure?
A: No. It only tells you if your microcode was old when the
system booted.
Q: Should the kernel warn if the microcode list itself is too old?
A: No. New kernels will get new microcode lists, both mainline
and stable. The only way to have an old list is to be running
an old kernel in which case you have bigger problems.
Q: Is this for security or functional issues?
A: Both.
Q: If a given microcode update only has functional problems but
no security issues, will it be considered old?
A: Yes. All microcode image versions within a microcode release
are treated identically. Intel appears to make security
updates without disclosing them in the release notes. Thus,
all updates are considered to be security-relevant.
Q: Who runs old microcode?
A: Anybody with an old distro. This happens all the time inside
of Intel where there are lots of weird systems in labs that
might not be getting regular distro updates and might also
be running rather exotic microcode images.
Q: If I update my microcode after booting will it stop saying
"Vulnerable"?
A: No. Just like all the other vulnerabilies, you need to
reboot before the kernel will reassess your vulnerability.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "Ahmed S. Darwish" <darwi@linutronix.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/20250421195659.CF426C07%40davehans-spike.ostc.intel.com
(cherry picked from commit 9127865b15eb0a1bd05ad7efe29489c44394bdc1)
module_add_driver() relies on module_kset list for
/sys/module/<built-in-module>/drivers directory creation.
Since,
commit 96a1a2412a ("kernel/params.c: defer most of param_sysfs_init() to late_initcall time")
drivers which are initialized from subsys_initcall() or any other
higher precedence initcall couldn't find the related kobject entry
in the module_kset list because module_kset is not fully populated
by the time module_add_driver() refers it. As a consequence,
module_add_driver() returns early without calling make_driver_name().
Therefore, /sys/module/<built-in-module>/drivers is never created.
Fix this issue by letting module_add_driver() handle module_kobject
creation itself.
Fixes: 96a1a2412a ("kernel/params.c: defer most of param_sysfs_init() to late_initcall time")
Cc: stable@vger.kernel.org # requires all other patches from the series
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250227184930.34163-5-shyamsaini@linux.microsoft.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
software_node_get_reference_args() wants to get @index-th element, so
the property value requires at least '(index + 1) * sizeof(*ref)' bytes
but that can not be guaranteed by current OOB check, and may cause OOB
for malformed property.
Fix by using as OOB check '((index + 1) * sizeof(*ref) > prop->length)'.
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20250414-fix_swnode-v2-1-9c9e6ae11eab@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot reported a uaf in software_node_notify_remove. [1]
When any of the two sysfs_create_link() in software_node_notify() fails,
the swnode->kobj reference count will not increase normally, which will
cause swnode to be released incorrectly due to the imbalance of kobj reference
count when executing software_node_notify_remove().
Increase the reference count of kobj before creating the link to avoid uaf.
[1]
BUG: KASAN: slab-use-after-free in software_node_notify_remove+0x1bc/0x1c0 drivers/base/swnode.c:1108
Read of size 1 at addr ffff888033c08908 by task syz-executor105/5844
Freed by task 5844:
software_node_notify_remove+0x159/0x1c0 drivers/base/swnode.c:1106
device_platform_notify_remove drivers/base/core.c:2387 [inline]
Fixes: 9eb59204d5 ("iommufd/selftest: Add set_dev_pasid in mock iommu")
Reported-by: syzbot+2ff22910687ee0dfd48e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2ff22910687ee0dfd48e
Tested-by: syzbot+2ff22910687ee0dfd48e@syzkaller.appspotmail.com
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20250414071123.1228331-1-lizhi.xu@windriver.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>