From fc6bcf9ac4de76f5e7bcd020b3c0a86faff3f2d5 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 21 Oct 2025 12:06:05 +0200 Subject: [PATCH 01/12] powerpc/pseries/cmm: call balloon_devinfo_init() also without CONFIG_BALLOON_COMPACTION Patch series "powerpc/pseries/cmm: two smaller fixes". Two smaller fixes identified while doing a bigger rework. This patch (of 2): We always have to initialize the balloon_dev_info, even when compaction is not configured in: otherwise the containing list and the lock are left uninitialized. Likely not many such configs exist in practice, but let's CC stable to be sure. This was found by code inspection. Link: https://lkml.kernel.org/r/20251021100606.148294-1-david@redhat.com Link: https://lkml.kernel.org/r/20251021100606.148294-2-david@redhat.com Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction") Signed-off-by: David Hildenbrand Reviewed-by: Ritesh Harjani (IBM) Cc: Christophe Leroy Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Signed-off-by: Andrew Morton --- arch/powerpc/platforms/pseries/cmm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/cmm.c b/arch/powerpc/platforms/pseries/cmm.c index 0823fa2da151..688f5fa1c724 100644 --- a/arch/powerpc/platforms/pseries/cmm.c +++ b/arch/powerpc/platforms/pseries/cmm.c @@ -550,7 +550,6 @@ static int cmm_migratepage(struct balloon_dev_info *b_dev_info, static void cmm_balloon_compaction_init(void) { - balloon_devinfo_init(&b_dev_info); b_dev_info.migratepage = cmm_migratepage; } #else /* CONFIG_BALLOON_COMPACTION */ @@ -572,6 +571,7 @@ static int cmm_init(void) if (!firmware_has_feature(FW_FEATURE_CMO) && !simulate) return -EOPNOTSUPP; + balloon_devinfo_init(&b_dev_info); cmm_balloon_compaction_init(); rc = register_oom_notifier(&cmm_oom_nb); From 0da2ba35c0d532ca0fe7af698b17d74c4d084b9a Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 21 Oct 2025 12:06:06 +0200 Subject: [PATCH 02/12] powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages Let's properly adjust BALLOON_MIGRATE like the other drivers. Note that the INFLATE/DEFLATE events are triggered from the core when enqueueing/dequeueing pages. This was found by code inspection. Link: https://lkml.kernel.org/r/20251021100606.148294-3-david@redhat.com Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction") Signed-off-by: David Hildenbrand Reviewed-by: Ritesh Harjani (IBM) Cc: Christophe Leroy Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Signed-off-by: Andrew Morton --- arch/powerpc/platforms/pseries/cmm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/pseries/cmm.c b/arch/powerpc/platforms/pseries/cmm.c index 688f5fa1c724..310dab4bc867 100644 --- a/arch/powerpc/platforms/pseries/cmm.c +++ b/arch/powerpc/platforms/pseries/cmm.c @@ -532,6 +532,7 @@ static int cmm_migratepage(struct balloon_dev_info *b_dev_info, spin_lock_irqsave(&b_dev_info->pages_lock, flags); balloon_page_insert(b_dev_info, newpage); + __count_vm_event(BALLOON_MIGRATE); b_dev_info->isolated_pages--; spin_unlock_irqrestore(&b_dev_info->pages_lock, flags); From 1cba2eba9b73d8dfee6b3e7465f510cace71637c Mon Sep 17 00:00:00 2001 From: Jinhui Guo Date: Thu, 27 Nov 2025 17:25:12 +0800 Subject: [PATCH 03/12] mm/sparse: fix sparse_vmemmap_init_nid_early definition without CONFIG_SPARSEMEM When CONFIG_SPARSEMEM is disabled, the macro sparse_vmemmap_init_nid_early(_nid, _use) passes two arguments, while the actual function accepts only nid. Drop the extra argument _use. Link: https://lkml.kernel.org/r/20251127092512.278-1-guojinhui.liam@bytedance.com Fixes: d65917c42373 ("mm/sparse: allow for alternate vmemmap section init at boot") Signed-off-by: Jinhui Guo Cc: Frank van der Linden Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: "David Hildenbrand (Red Hat)" Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4398e027f450..75ef7c9f9307 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -2289,7 +2289,7 @@ void sparse_init(void); #else #define sparse_init() do {} while (0) #define sparse_index_init(_sec, _nid) do {} while (0) -#define sparse_vmemmap_init_nid_early(_nid, _use) do {} while (0) +#define sparse_vmemmap_init_nid_early(_nid) do {} while (0) #define sparse_vmemmap_init_nid_late(_nid) do {} while (0) #define pfn_in_present_section pfn_valid #define subsection_map_init(_pfn, _nr_pages) do {} while (0) From bdd0d69a32c2aa6437d23e35acc705758b835a75 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Wed, 26 Nov 2025 16:06:15 -0500 Subject: [PATCH 04/12] mm/huge_memory: change folio_split_supported() to folio_check_splittable() Patch series "Improve folio split related functions", v4. This patchset improves several folio split related functions to avoid future misuse. The changes are: 1. Consolidated folio splittable checks by moving truncated folio check, huge zero folio check, and writeback folio check into folio_split_supported(). Changed the function return type. Renamed it to folio_check_splittable() for clarification. 2. Replaced can_split_folio() with open coded folio_expected_ref_count() and folio_ref_count() and introduced folio_cache_ref_count(). 3. Changed min_order_for_split() to always return an order. 4. Fixed folio split stats counting. Motivation ========== This is based on Wei's observation[1] and solves several potential issues: 1. Dereferencing NULL folio->mapping in try_folio_split_to_order() if it is called on truncated folios. 2. Not handling of negative return value of min_order_for_split() in mm/memory-failure.c There is no bug in the current code. This patch (of 4): folio_split_supported() used in try_folio_split_to_order() requires folio->mapping to be non NULL, but current try_folio_split_to_order() does not check it. There is no issue in the current code, since try_folio_split_to_order() is only used in truncate_inode_partial_folio(), where folio->mapping is not NULL. To prevent future misuse, move folio->mapping NULL check (i.e., folio is truncated) into folio_split_supported(). Since folio->mapping NULL check returns -EBUSY and folio_split_supported() == false means -EINVAL, change folio_split_supported() return type from bool to int and return error numbers accordingly. Rename folio_split_supported() to folio_check_splittable() to match the return type change. While at it, move is_huge_zero_folio() check and folio_test_writeback() check into folio_check_splittable() and add kernel-doc. Remove all warnings inside folio_check_splittable() and give warnings in __folio_split() instead, so that bool warns parameter can be removed. Link: https://lkml.kernel.org/r/20251126210618.1971206-1-ziy@nvidia.com Link: https://lkml.kernel.org/r/20251126210618.1971206-2-ziy@nvidia.com Signed-off-by: Zi Yan Reviewed-by: Wei Yang Acked-by: Balbir Singh Acked-by: David Hildenbrand (Red Hat) Cc: Baolin Wang Cc: Barry Song Cc: Dev Jain Cc: Lance Yang Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Nico Pache Cc: Ryan Roberts Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 6 ++-- mm/huge_memory.c | 76 +++++++++++++++++++++++------------------ 2 files changed, 46 insertions(+), 36 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 1d439de1ca2c..66105a90b4c3 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -375,8 +375,8 @@ int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list int folio_split_unmapped(struct folio *folio, unsigned int new_order); int min_order_for_split(struct folio *folio); int split_folio_to_list(struct folio *folio, struct list_head *list); -bool folio_split_supported(struct folio *folio, unsigned int new_order, - enum split_type split_type, bool warns); +int folio_check_splittable(struct folio *folio, unsigned int new_order, + enum split_type split_type); int folio_split(struct folio *folio, unsigned int new_order, struct page *page, struct list_head *list); @@ -407,7 +407,7 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o static inline int try_folio_split_to_order(struct folio *folio, struct page *page, unsigned int new_order) { - if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false)) + if (folio_check_splittable(folio, new_order, SPLIT_TYPE_NON_UNIFORM)) return split_huge_page_to_order(&folio->page, new_order); return folio_split(folio, new_order, page, NULL); } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 041b554c7115..8c2516ac9ce7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3688,15 +3688,40 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, return 0; } -bool folio_split_supported(struct folio *folio, unsigned int new_order, - enum split_type split_type, bool warns) +/** + * folio_check_splittable() - check if a folio can be split to a given order + * @folio: folio to be split + * @new_order: the smallest order of the after split folios (since buddy + * allocator like split generates folios with orders from @folio's + * order - 1 to new_order). + * @split_type: uniform or non-uniform split + * + * folio_check_splittable() checks if @folio can be split to @new_order using + * @split_type method. The truncated folio check must come first. + * + * Context: folio must be locked. + * + * Return: 0 - @folio can be split to @new_order, otherwise an error number is + * returned. + */ +int folio_check_splittable(struct folio *folio, unsigned int new_order, + enum split_type split_type) { + VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); + /* + * Folios that just got truncated cannot get split. Signal to the + * caller that there was a race. + * + * TODO: this will also currently refuse folios without a mapping in the + * swapcache (shmem or to-be-anon folios). + */ + if (!folio->mapping && !folio_test_anon(folio)) + return -EBUSY; + if (folio_test_anon(folio)) { /* order-1 is not supported for anonymous THP. */ - VM_WARN_ONCE(warns && new_order == 1, - "Cannot split to order-1 folio"); if (new_order == 1) - return false; + return -EINVAL; } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) { if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !mapping_large_folio_support(folio->mapping)) { @@ -3717,9 +3742,7 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order, * case, the mapping does not actually support large * folios properly. */ - VM_WARN_ONCE(warns, - "Cannot split file folio to non-0 order"); - return false; + return -EINVAL; } } @@ -3732,12 +3755,16 @@ bool folio_split_supported(struct folio *folio, unsigned int new_order, * here. */ if ((split_type == SPLIT_TYPE_NON_UNIFORM || new_order) && folio_test_swapcache(folio)) { - VM_WARN_ONCE(warns, - "Cannot split swapcache folio to non-0 order"); - return false; + return -EINVAL; } - return true; + if (is_huge_zero_folio(folio)) + return -EINVAL; + + if (folio_test_writeback(folio)) + return -EBUSY; + + return 0; } static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int new_order, @@ -3922,7 +3949,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order, int remap_flags = 0; int extra_pins, ret; pgoff_t end = 0; - bool is_hzp; VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio); VM_WARN_ON_ONCE_FOLIO(!folio_test_large(folio), folio); @@ -3930,31 +3956,15 @@ static int __folio_split(struct folio *folio, unsigned int new_order, if (folio != page_folio(split_at) || folio != page_folio(lock_at)) return -EINVAL; - /* - * Folios that just got truncated cannot get split. Signal to the - * caller that there was a race. - * - * TODO: this will also currently refuse shmem folios that are in the - * swapcache. - */ - if (!is_anon && !folio->mapping) - return -EBUSY; - if (new_order >= old_order) return -EINVAL; - if (!folio_split_supported(folio, new_order, split_type, /* warn = */ true)) - return -EINVAL; - - is_hzp = is_huge_zero_folio(folio); - if (is_hzp) { - pr_warn_ratelimited("Called split_huge_page for huge zero page\n"); - return -EBUSY; + ret = folio_check_splittable(folio, new_order, split_type); + if (ret) { + VM_WARN_ONCE(ret == -EINVAL, "Tried to split an unsplittable folio"); + return ret; } - if (folio_test_writeback(folio)) - return -EBUSY; - if (is_anon) { /* * The caller does not necessarily hold an mmap_lock that would From 5842bcbfc316738cbfcbdb4def5a7592aa03ebf2 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Wed, 26 Nov 2025 16:06:16 -0500 Subject: [PATCH 05/12] mm/huge_memory: replace can_split_folio() with direct refcount calculation can_split_folio() is just a refcount comparison, making sure only the split caller holds an extra pin. Open code it with folio_expected_ref_count() != folio_ref_count() - 1. For the extra_pins used by folio_ref_freeze(), add folio_cache_ref_count() to calculate it. Also replace folio_expected_ref_count() with folio_cache_ref_count() used by folio_ref_unfreeze(), since they are returning the same values when a folio is frozen and folio_cache_ref_count() does not have unnecessary folio_mapcount() in its implementation. Link: https://lkml.kernel.org/r/20251126210618.1971206-3-ziy@nvidia.com Signed-off-by: Zi Yan Suggested-by: David Hildenbrand (Red Hat) Reviewed-by: Wei Yang Acked-by: David Hildenbrand (Red Hat) Cc: Balbir Singh Cc: Baolin Wang Cc: Barry Song Cc: Dev Jain Cc: Lance Yang Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Nico Pache Cc: Ryan Roberts Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 1 - mm/huge_memory.c | 52 ++++++++++++++++------------------------- mm/vmscan.c | 3 ++- 3 files changed, 22 insertions(+), 34 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 66105a90b4c3..8a52e20387b0 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -369,7 +369,6 @@ enum split_type { SPLIT_TYPE_NON_UNIFORM, }; -bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins); int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list, unsigned int new_order); int folio_split_unmapped(struct folio *folio, unsigned int new_order); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8c2516ac9ce7..5ce00d53b19e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3455,23 +3455,6 @@ static void lru_add_split_folio(struct folio *folio, struct folio *new_folio, } } -/* Racy check whether the huge page can be split */ -bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins) -{ - int extra_pins; - - /* Additional pins from page cache */ - if (folio_test_anon(folio)) - extra_pins = folio_test_swapcache(folio) ? - folio_nr_pages(folio) : 0; - else - extra_pins = folio_nr_pages(folio); - if (pextra_pins) - *pextra_pins = extra_pins; - return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins - - caller_pins; -} - static bool page_range_has_hwpoisoned(struct page *page, long nr_pages) { for (; nr_pages; page++, nr_pages--) @@ -3767,11 +3750,19 @@ int folio_check_splittable(struct folio *folio, unsigned int new_order, return 0; } +/* Number of folio references from the pagecache or the swapcache. */ +static unsigned int folio_cache_ref_count(const struct folio *folio) +{ + if (folio_test_anon(folio) && !folio_test_swapcache(folio)) + return 0; + return folio_nr_pages(folio); +} + static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int new_order, struct page *split_at, struct xa_state *xas, struct address_space *mapping, bool do_lru, struct list_head *list, enum split_type split_type, - pgoff_t end, int *nr_shmem_dropped, int extra_pins) + pgoff_t end, int *nr_shmem_dropped) { struct folio *end_folio = folio_next(folio); struct folio *new_folio, *next; @@ -3782,10 +3773,9 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n VM_WARN_ON_ONCE(!mapping && end); /* Prevent deferred_split_scan() touching ->_refcount */ ds_queue = folio_split_queue_lock(folio); - if (folio_ref_freeze(folio, 1 + extra_pins)) { + if (folio_ref_freeze(folio, folio_cache_ref_count(folio) + 1)) { struct swap_cluster_info *ci = NULL; struct lruvec *lruvec; - int expected_refs; if (old_order > 1) { if (!list_empty(&folio->_deferred_list)) { @@ -3853,8 +3843,8 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n zone_device_private_split_cb(folio, new_folio); - expected_refs = folio_expected_ref_count(new_folio) + 1; - folio_ref_unfreeze(new_folio, expected_refs); + folio_ref_unfreeze(new_folio, + folio_cache_ref_count(new_folio) + 1); if (do_lru) lru_add_split_folio(folio, new_folio, lruvec, list); @@ -3897,8 +3887,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n * Otherwise, a parallel folio_try_get() can grab @folio * and its caller can see stale page cache entries. */ - expected_refs = folio_expected_ref_count(folio) + 1; - folio_ref_unfreeze(folio, expected_refs); + folio_ref_unfreeze(folio, folio_cache_ref_count(folio) + 1); if (do_lru) unlock_page_lruvec(lruvec); @@ -3947,7 +3936,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, struct folio *new_folio, *next; int nr_shmem_dropped = 0; int remap_flags = 0; - int extra_pins, ret; + int ret; pgoff_t end = 0; VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio); @@ -4028,7 +4017,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, * Racy check if we can split the page, before unmap_folio() will * split PMDs */ - if (!can_split_folio(folio, 1, &extra_pins)) { + if (folio_expected_ref_count(folio) != folio_ref_count(folio) - 1) { ret = -EAGAIN; goto out_unlock; } @@ -4051,8 +4040,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order, } ret = __folio_freeze_and_split_unmapped(folio, new_order, split_at, &xas, mapping, - true, list, split_type, end, &nr_shmem_dropped, - extra_pins); + true, list, split_type, end, &nr_shmem_dropped); fail: if (mapping) xas_unlock(&xas); @@ -4126,20 +4114,20 @@ out: */ int folio_split_unmapped(struct folio *folio, unsigned int new_order) { - int extra_pins, ret = 0; + int ret = 0; VM_WARN_ON_ONCE_FOLIO(folio_mapped(folio), folio); VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio); VM_WARN_ON_ONCE_FOLIO(!folio_test_large(folio), folio); VM_WARN_ON_ONCE_FOLIO(!folio_test_anon(folio), folio); - if (!can_split_folio(folio, 1, &extra_pins)) + if (folio_expected_ref_count(folio) != folio_ref_count(folio) - 1) return -EAGAIN; local_irq_disable(); ret = __folio_freeze_and_split_unmapped(folio, new_order, &folio->page, NULL, NULL, false, NULL, SPLIT_TYPE_UNIFORM, - 0, NULL, extra_pins); + 0, NULL); local_irq_enable(); return ret; } @@ -4632,7 +4620,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, * can be split or not. So skip the check here. */ if (!folio_test_private(folio) && - !can_split_folio(folio, 0, NULL)) + folio_expected_ref_count(folio) != folio_ref_count(folio)) goto next; if (!folio_trylock(folio)) diff --git a/mm/vmscan.c b/mm/vmscan.c index 92980b072121..3b85652a42b9 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1284,7 +1284,8 @@ retry: goto keep_locked; if (folio_test_large(folio)) { /* cannot split folio, skip it */ - if (!can_split_folio(folio, 1, NULL)) + if (folio_expected_ref_count(folio) != + folio_ref_count(folio) - 1) goto activate_locked; /* * Split partially mapped folios right away. From 2f78910659c72807b7ff03a2c0d121901bf55848 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Wed, 26 Nov 2025 16:06:17 -0500 Subject: [PATCH 06/12] mm/huge_memory: make min_order_for_split() always return an order min_order_for_split() returns -EBUSY when the folio is truncated and cannot be split. In commit 77008e1b2ef7 ("mm/huge_memory: do not change split_huge_page*() target order silently"), memory_failure() does not handle it and pass -EBUSY to try_to_split_thp_page() directly. try_to_split_thp_page() returns -EINVAL since -EBUSY becomes 0xfffffff0 as new_order is unsigned int in __folio_split() and this large new_order is rejected as an invalid input. The code does not cause a bug. soft_offline_in_use_page() also uses min_order_for_split() but it always passes 0 as new_order for split. Fix it by making min_order_for_split() always return an order. When the given folio is truncated, namely folio->mapping == NULL, return 0 and let a subsequent split function handle the situation and return -EBUSY. Add kernel-doc to min_order_for_split() to clarify its use. Link: https://lkml.kernel.org/r/20251126210618.1971206-4-ziy@nvidia.com Signed-off-by: Zi Yan Reviewed-by: Wei Yang Acked-by: David Hildenbrand (Red Hat) Reviewed-by: Lorenzo Stoakes Cc: Balbir Singh Cc: Baolin Wang Cc: Barry Song Cc: Dev Jain Cc: Lance Yang Cc: Liam Howlett Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Nico Pache Cc: Ryan Roberts Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 6 +++--- mm/huge_memory.c | 25 +++++++++++++++++++------ 2 files changed, 22 insertions(+), 9 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 8a52e20387b0..21162493a0a0 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -372,7 +372,7 @@ enum split_type { int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list, unsigned int new_order); int folio_split_unmapped(struct folio *folio, unsigned int new_order); -int min_order_for_split(struct folio *folio); +unsigned int min_order_for_split(struct folio *folio); int split_folio_to_list(struct folio *folio, struct list_head *list); int folio_check_splittable(struct folio *folio, unsigned int new_order, enum split_type split_type); @@ -630,10 +630,10 @@ static inline int split_huge_page(struct page *page) return -EINVAL; } -static inline int min_order_for_split(struct folio *folio) +static inline unsigned int min_order_for_split(struct folio *folio) { VM_WARN_ON_ONCE_FOLIO(1, folio); - return -EINVAL; + return 0; } static inline int split_folio_to_list(struct folio *folio, struct list_head *list) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5ce00d53b19e..1a3273491cc5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -4219,16 +4219,29 @@ int folio_split(struct folio *folio, unsigned int new_order, SPLIT_TYPE_NON_UNIFORM); } -int min_order_for_split(struct folio *folio) +/** + * min_order_for_split() - get the minimum order @folio can be split to + * @folio: folio to split + * + * min_order_for_split() tells the minimum order @folio can be split to. + * If a file-backed folio is truncated, 0 will be returned. Any subsequent + * split attempt should get -EBUSY from split checking code. + * + * Return: @folio's minimum order for split + */ +unsigned int min_order_for_split(struct folio *folio) { if (folio_test_anon(folio)) return 0; - if (!folio->mapping) { - if (folio_test_pmd_mappable(folio)) - count_vm_event(THP_SPLIT_PAGE_FAILED); - return -EBUSY; - } + /* + * If the folio got truncated, we don't know the previous mapping and + * consequently the old min order. But it doesn't matter, as any split + * attempt will immediately fail with -EBUSY as the folio cannot get + * split until freed. + */ + if (!folio->mapping) + return 0; return mapping_min_folio_order(folio->mapping); } From 9dcdc0c207fe32c576f1359deaf0efece9f36ca2 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Wed, 26 Nov 2025 16:06:18 -0500 Subject: [PATCH 07/12] mm/huge_memory: fix folio split stats counting The "return " statements for error checks at the beginning of __folio_split() skip necessary count_vm_event() and count_mthp_stat() at the end of the function. Fix these by replacing them with "ret = ; goto out;". Link: https://lkml.kernel.org/r/20251126210618.1971206-5-ziy@nvidia.com Signed-off-by: Zi Yan Reviewed-by: Wei Yang Reviewed-by: Lorenzo Stoakes Acked-by: David Hildenbrand (Red Hat) Cc: Balbir Singh Cc: Baolin Wang Cc: Barry Song Cc: Dev Jain Cc: Lance Yang Cc: Liam Howlett Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Nico Pache Cc: Ryan Roberts Signed-off-by: Andrew Morton --- mm/huge_memory.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1a3273491cc5..8db0d81fca40 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3942,16 +3942,20 @@ static int __folio_split(struct folio *folio, unsigned int new_order, VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio); VM_WARN_ON_ONCE_FOLIO(!folio_test_large(folio), folio); - if (folio != page_folio(split_at) || folio != page_folio(lock_at)) - return -EINVAL; + if (folio != page_folio(split_at) || folio != page_folio(lock_at)) { + ret = -EINVAL; + goto out; + } - if (new_order >= old_order) - return -EINVAL; + if (new_order >= old_order) { + ret = -EINVAL; + goto out; + } ret = folio_check_splittable(folio, new_order, split_type); if (ret) { VM_WARN_ONCE(ret == -EINVAL, "Tried to split an unsplittable folio"); - return ret; + goto out; } if (is_anon) { From 40a4af52e0472dfc114aa78d6f3debec70b42048 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Mon, 1 Dec 2025 13:29:22 +0100 Subject: [PATCH 08/12] mm: fix CONFIG_STACK_GROWSUP typo in mm.h Commit 2b6a3f061f11 ("mm: declare VMA flags by bit") significantly refactors the header file include/linux/mm.h. In that step, it introduces a typo in an ifdef, referring to a non-existing config option STACK_GROWS_UP, whereas the actual config option is called STACK_GROWSUP. Fix this typo in the mm header file. Link: https://lkml.kernel.org/r/20251201122922.352480-1-lukas.bulwahn@redhat.com Fixes: 2b6a3f061f11 ("mm: declare VMA flags by bit") Signed-off-by: Lukas Bulwahn Acked-by: David Hildenbrand (Red Hat) Reviewed-by: Lorenzo Stoakes Cc: Alice Ryhl Cc: Liam Howlett Cc: Michal Hocko Cc: Mike Rapoport Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2887d3b34d3e..03f7f92d08c8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -438,7 +438,7 @@ enum { #define VM_NOHUGEPAGE INIT_VM_FLAG(NOHUGEPAGE) #define VM_MERGEABLE INIT_VM_FLAG(MERGEABLE) #define VM_STACK INIT_VM_FLAG(STACK) -#ifdef CONFIG_STACK_GROWS_UP +#ifdef CONFIG_STACK_GROWSUP #define VM_STACK_EARLY INIT_VM_FLAG(STACK_EARLY) #else #define VM_STACK_EARLY VM_NONE From 9ee5d1766c8bfa4924bd47e31c4dd193493f5a45 Mon Sep 17 00:00:00 2001 From: Shameer Kolothum Date: Tue, 25 Nov 2025 17:13:50 +0000 Subject: [PATCH 09/12] mm/hugetlb: fix incorrect error return from hugetlb_reserve_pages() The function hugetlb_reserve_pages() returns the number of pages added to the reservation map on success and a negative error code on failure (e.g. -EINVAL, -ENOMEM). However, in some error paths, it may return -1 directly. For example, a failure at: if (hugetlb_acct_memory(h, gbl_reserve) < 0) goto out_put_pages; results in returning -1 (since add = -1), which may be misinterpreted in userspace as -EPERM. Fix this by explicitly capturing and propagating the return values from helper functions, and using -EINVAL for all other failure cases. Link: https://lkml.kernel.org/r/20251125171350.86441-1-skolothumtho@nvidia.com Fixes: 986f5f2b4be3 ("mm/hugetlb: make hugetlb_reserve_pages() return nr of entries updated") Signed-off-by: Shameer Kolothum Reviewed-by: Joshua Hahn Reviewed-by: Jason Gunthorpe Acked-by: Oscar Salvador Cc: Matthew R. Ochs Cc: Muchun Song Cc: Nicolin Chen Cc: Vivek Kasireddy Signed-off-by: Andrew Morton --- mm/hugetlb.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9e7815b4f058..51273baec9e5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6579,6 +6579,7 @@ long hugetlb_reserve_pages(struct inode *inode, struct resv_map *resv_map; struct hugetlb_cgroup *h_cg = NULL; long gbl_reserve, regions_needed = 0; + int err; /* This should never happen */ if (from > to) { @@ -6612,8 +6613,10 @@ long hugetlb_reserve_pages(struct inode *inode, } else { /* Private mapping. */ resv_map = resv_map_alloc(); - if (!resv_map) + if (!resv_map) { + err = -ENOMEM; goto out_err; + } chg = to - from; @@ -6621,11 +6624,15 @@ long hugetlb_reserve_pages(struct inode *inode, set_vma_desc_resv_flags(desc, HPAGE_RESV_OWNER); } - if (chg < 0) + if (chg < 0) { + /* region_chg() above can return -ENOMEM */ + err = (chg == -ENOMEM) ? -ENOMEM : -EINVAL; goto out_err; + } - if (hugetlb_cgroup_charge_cgroup_rsvd(hstate_index(h), - chg * pages_per_huge_page(h), &h_cg) < 0) + err = hugetlb_cgroup_charge_cgroup_rsvd(hstate_index(h), + chg * pages_per_huge_page(h), &h_cg); + if (err < 0) goto out_err; if (desc && !(desc->vm_flags & VM_MAYSHARE) && h_cg) { @@ -6641,14 +6648,17 @@ long hugetlb_reserve_pages(struct inode *inode, * reservations already in place (gbl_reserve). */ gbl_reserve = hugepage_subpool_get_pages(spool, chg); - if (gbl_reserve < 0) + if (gbl_reserve < 0) { + err = gbl_reserve; goto out_uncharge_cgroup; + } /* * Check enough hugepages are available for the reservation. * Hand the pages back to the subpool if there are not */ - if (hugetlb_acct_memory(h, gbl_reserve) < 0) + err = hugetlb_acct_memory(h, gbl_reserve); + if (err < 0) goto out_put_pages; /* @@ -6667,6 +6677,7 @@ long hugetlb_reserve_pages(struct inode *inode, if (unlikely(add < 0)) { hugetlb_acct_memory(h, -gbl_reserve); + err = add; goto out_put_pages; } else if (unlikely(chg > add)) { /* @@ -6726,7 +6737,7 @@ out_err: kref_put(&resv_map->refs, resv_map_release); set_vma_desc_resv_map(desc, NULL); } - return chg < 0 ? chg : add < 0 ? add : -EINVAL; + return err; } long hugetlb_unreserve_pages(struct inode *inode, long start, long end, From 12c1fa8d4631e5fa8d1611379fc6babb558755e1 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Wed, 5 Nov 2025 11:58:57 +0100 Subject: [PATCH 10/12] MAINTAINERS: add idr core-api doc file to XARRAY The files in Documentation/core-api/ are by virtue of their top-level directory part of the Documentation section in MAINTAINERS. Each file in Documentation/core-api/ should however also have a further section in MAINTAINERS it belongs to, which fits to the technical area of the documented API in that file. idr.rst provides some explanation to the ID allocation API defined in lib/idr.c, which itself is part of the XARRAY section. Add this core-api document to XARRAY. Link: https://lkml.kernel.org/r/20251105105857.156950-1-lukas.bulwahn@redhat.com Signed-off-by: Lukas Bulwahn Cc: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 302c57deffac..233ab04e8716 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -27879,6 +27879,7 @@ M: Matthew Wilcox L: linux-fsdevel@vger.kernel.org L: linux-mm@kvack.org S: Supported +F: Documentation/core-api/idr.rst F: Documentation/core-api/xarray.rst F: include/linux/idr.h F: include/linux/xarray.h From 49d921b471c51316ccfd659f4d81efbbbe3613db Mon Sep 17 00:00:00 2001 From: Chen Ridong Date: Thu, 4 Dec 2025 12:23:55 +0000 Subject: [PATCH 11/12] mm: vmscan: correct nr_requested tracing in scan_folios When enabling vmscan tracing, it is observed that nr_requested is always 4096, which is confusing. mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ... mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ... mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ... mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ... mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ... mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ... mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ... This is because it prints MAX_LRU_BATCH, which is meaningless as it's a constant. To fix this, modify it to print capped valued. Link: https://lkml.kernel.org/r/20251204122355.1822919-1-chenridong@huaweicloud.com Fixes: 8c2214fc9a47 ("mm: multi-gen LRU: reuse some legacy trace events") Signed-off-by: Chen Ridong Acked-by: David Hildenbrand (Red Hat) Reviewed-by: Lance Yang Cc: Axel Rasmussen Cc: Jaewon Kim Cc: Johannes Weiner Cc: Lorenzo Stoakes Cc: Lu Jialin Cc: Michal Hocko Cc: Qi Zheng Cc: Shakeel Butt Cc: Wei Xu Cc: Yuanchu Xie Cc: Yu Zhao Signed-off-by: Andrew Morton --- mm/vmscan.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 3b85652a42b9..7fcc0120a72b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4541,7 +4541,8 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec, int scanned = 0; int isolated = 0; int skipped = 0; - int remaining = min(nr_to_scan, MAX_LRU_BATCH); + int scan_batch = min(nr_to_scan, MAX_LRU_BATCH); + int remaining = scan_batch; struct lru_gen_folio *lrugen = &lruvec->lrugen; struct mem_cgroup *memcg = lruvec_memcg(lruvec); @@ -4601,7 +4602,7 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec, count_memcg_events(memcg, item, isolated); count_memcg_events(memcg, PGREFILL, sorted); __count_vm_events(PGSCAN_ANON + type, isolated); - trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, MAX_LRU_BATCH, + trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, scan_batch, scanned, skipped, isolated, type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); if (type == LRU_GEN_FILE) From dafdba0964bd10913fbaa5537201cbbe05df5b9c Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 4 Dec 2025 11:03:54 +0100 Subject: [PATCH 12/12] mm/damon/tests/core-kunit: avoid damos_test_commit stack warning The newly added damos_test_commit() constructs multiple large structures on the stack, which exceeds the warning limit in some cases: In file included from mm/damon/core.c:2941: mm/damon/tests/core-kunit.h: In function 'damos_test_commit': mm/damon/tests/core-kunit.h:965:1: error: the frame size of 1520 bytes is larger than 1280 bytes [-Werror=frame-larger-than=] Split this function up into two separate ones that are called sequentially, so they can occupy the same stack slots. Link: https://lkml.kernel.org/r/20251204100403.1034980-1-arnd@kernel.org Fixes: 299a88f6ec13 ("mm/damon/tests/core-kunit: add damos_commit() test") Signed-off-by: Arnd Bergmann Reviewed-by: SeongJae Park Cc: Quanmin Yan Signed-off-by: Andrew Morton --- mm/damon/tests/core-kunit.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/damon/tests/core-kunit.h b/mm/damon/tests/core-kunit.h index a1eff023e928..8cb369b63e08 100644 --- a/mm/damon/tests/core-kunit.h +++ b/mm/damon/tests/core-kunit.h @@ -924,7 +924,7 @@ static void damos_test_commit_for(struct kunit *test, struct damos *dst, } } -static void damos_test_commit(struct kunit *test) +static void damos_test_commit_pageout(struct kunit *test) { damos_test_commit_for(test, &(struct damos){ @@ -945,6 +945,10 @@ static void damos_test_commit(struct kunit *test) DAMOS_WMARK_FREE_MEM_RATE, 800, 50, 30}, }); +} + +static void damos_test_commit_migrate_hot(struct kunit *test) +{ damos_test_commit_for(test, &(struct damos){ .pattern = (struct damos_access_pattern){ @@ -1230,7 +1234,8 @@ static struct kunit_case damon_test_cases[] = { KUNIT_CASE(damos_test_commit_quota), KUNIT_CASE(damos_test_commit_dests), KUNIT_CASE(damos_test_commit_filter), - KUNIT_CASE(damos_test_commit), + KUNIT_CASE(damos_test_commit_pageout), + KUNIT_CASE(damos_test_commit_migrate_hot), KUNIT_CASE(damon_test_commit_target_regions), KUNIT_CASE(damos_test_filter_out), KUNIT_CASE(damon_test_feed_loop_next_input),