mirror-linux/drivers/base
Davidlohr Bueso b980077899 mm: introduce per-node proactive reclaim interface
This adds support for allowing proactive reclaim in general on a NUMA
system.  A per-node interface extends support for beyond a memcg-specific
interface, respecting the current semantics of memory.reclaim: respecting
aging LRU and not supporting artificially triggering eviction on nodes
belonging to non-bottom tiers.

This patch allows userspace to do:

     echo "512M swappiness=10" > /sys/devices/system/node/nodeX/reclaim

One of the premises for this is to semantically align as best as possible
with memory.reclaim.  During a brief time memcg did support nodemask until
55ab834a86 (Revert "mm: add nodes= arg to memory.reclaim"), for which
semantics around reclaim (eviction) vs demotion were not clear, rendering
charging expectations to be broken.

With this approach:

1. Users who do not use memcg can benefit from proactive reclaim.  The
   memcg interface is not NUMA aware and there are usecases that are
   focusing on NUMA balancing rather than workload memory footprint.

2. Proactive reclaim on top tiers will trigger demotion, for which
   memory is still byte-addressable.  Reclaiming on the bottom nodes will
   trigger evicting to swap (the traditional sense of reclaim).  This
   follows the semantics of what is today part of the aging process on
   tiered memory, mirroring what every other form of reclaim does
   (reactive and memcg proactive reclaim).  Furthermore per-node proactive
   reclaim is not as susceptible to the memcg charging problem mentioned
   above.

3. Unlike the nodes= arg, this interface avoids confusing semantics,
   such as what exactly the user wants when mixing top-tier and low-tier
   nodes in the nodemask.  Further per-node interface is less exposed to
   "free up memory in my container" usecases, where eviction is intended.

4. Users that *really* want to free up memory can use proactive
   reclaim on nodes knowingly to be on the bottom tiers to force eviction
   in a natural way - higher access latencies are still better than swap. 
   If compelled, while no guarantees and perhaps not worth the effort,
   users could also also potentially follow a ladder-like approach to
   eventually free up the memory.  Alternatively, perhaps an 'evict'
   option could be added to the parameters for both memory.reclaim and
   per-node interfaces to force this action unconditionally.

[akpm@linux-foundation.org: user_proactive_reclaim(): return -EBUSY on PGDAT_RECLAIM_LOCKED contention, per Roman]
[dave@stgolabs.net: memcg && node is also a bogus case, per Shakeel]
  Link: https://lkml.kernel.org/r/20250717235604.2atyx2aobwowpge3@offworld
Link: https://lkml.kernel.org/r/20250623185851.830632-5-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 18:59:53 -07:00
..
firmware_loader firmware_loader: use SHA-256 library API instead of crypto_shash API 2025-04-30 21:55:54 +02:00
power PM: Restrict swap use to later in the suspend sequence 2025-06-26 20:39:34 +02:00
regmap regmap: irq: Add support for chips without separate IRQ status 2025-05-22 13:11:50 +01:00
test
Kconfig
Makefile
arch_numa.c
arch_topology.c arch_topology: Relocate cpu_scale to topology.[h|c] 2025-05-07 21:56:55 +02:00
attribute_container.c
auxiliary.c Linux 6.15-rc4 2025-04-28 09:51:34 +02:00
auxiliary_sysfs.c
base.h
bus.c
cacheinfo.c
class.c
component.c
container.c
core.c
cpu.c Merge commit 'its-for-linus-20250509-merge' into x86/core, to resolve conflicts 2025-05-13 10:47:10 +02:00
dd.c
devcoredump.c
devres.c Immutable tag for the driver core tree to pull from 2025-04-28 10:18:29 +02:00
devtmpfs.c devtmpfs: don't use vfs_getattr_nosec to query i_mode 2025-04-25 12:11:45 +02:00
driver.c
faux.c driver core: faux: Quiet probe failures 2025-06-10 19:23:25 +02:00
firmware.c
hypervisor.c
init.c
isa.c
map.c
memory.c drivers/base/node: optimize memory block registration to reduce boot time 2025-07-09 22:41:59 -07:00
module.c drivers: base: handle module_kobject creation 2025-04-16 15:10:55 +02:00
node.c mm: introduce per-node proactive reclaim interface 2025-07-19 18:59:53 -07:00
physical_location.c
physical_location.h
pinctrl.c
platform-msi.c platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() 2025-05-07 17:49:00 +02:00
platform.c Merge 6.15-rc6 into driver-core-next 2025-05-12 14:16:34 +02:00
property.c Char/Misc/IIO pull request for 6.16-rc1 2025-06-06 11:50:47 -07:00
soc.c
swnode.c Linux 6.15-rc4 2025-04-28 09:51:34 +02:00
syscore.c
topology.c arch_topology: Relocate cpu_scale to topology.[h|c] 2025-05-07 21:56:55 +02:00
trace.c
trace.h
transport_class.c