mirror-linux

History

Wupeng Ma 3c2d42b8ee mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison Two concurrent madvise(MADV_HWPOISON) calls on the same hugetlb page can trigger a recursive spinlock self-deadlock (AA deadlock) on hugetlb_lock when racing with a concurrent unmap: thread#0 thread#1 -------- -------- madvise(folio, MADV_HWPOISON) -> poisons the folio successfully madvise(folio, MADV_HWPOISON) unmap(folio) try_memory_failure_hugetlb get_huge_page_for_hwpoison spin_lock_irq(&hugetlb_lock) <- held __get_huge_page_for_hwpoison hugetlb_update_hwpoison() -> MF_HUGETLB_FOLIO_PRE_POISONED goto out: folio_put() refcount: 1 -> 0 free_huge_folio() spin_lock_irqsave(&hugetlb_lock) -> AA DEADLOCK! The out: path in __get_huge_page_for_hwpoison() calls folio_put() to drop the GUP reference while the hugetlb_lock is still held by the hugetlb.c wrapper get_huge_page_for_hwpoison(). If concurrent unmap has released the page table mapping reference, folio_put() drops the folio refcount to zero, triggering free_huge_folio() which attempts to re-acquire the non-recursive hugetlb_lock. Fix this by moving hugetlb_lock acquisition from the hugetlb.c wrapper into get_huge_page_for_hwpoison(). Place spin_unlock_irq() before the folio_put() at the out: label so the folio is always released outside the lock. [akpm@linux-foundation.org: fix race, rename label per Miaohe] Link: https://sashiko.dev/#/patchset/20260522010305.4099834-1-mawupeng1@huawei.com Link: https://lore.kernel.org/f39f405e-4b4b-8f79-70fe-a2b5b62114eb@huawei.com Link: https://lore.kernel.org/20260522010305.4099834-1-mawupeng1@huawei.com Fixes: `405ce05123` ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()") Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Acked-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2026-05-28 20:50:34 -07:00
..
acpi	Merge branches 'acpi-apei', 'acpi-bus', 'acpi-cppc' and 'acpi-video'	2026-04-30 21:07:06 +02:00
asm-generic	ring-buffer: Flush and stop persistent ring buffer on panic	2026-05-21 08:20:58 -04:00
clocksource	…
crypto	crypto/krb5, rxrpc: Fix lack of pre-decrypt/pre-verify length checks	2026-05-20 16:36:45 -07:00
cxl	…
drm	Short summary of fixes pull:	2026-05-22 07:01:04 +10:00
dt-bindings	We've finally gotten rid of the struct clk_ops::round_rate() code after months	2026-04-21 08:33:26 -07:00
hyperv	x86/hyperv: Skip LP/VP creation on kexec	2026-04-22 06:23:25 +00:00
keys	…
kunit	…
kvm	…
linux	mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison	2026-05-28 20:50:34 -07:00
math-emu	…
media	…
memory	…
misc	…
net	13 hotfixes. 9 are for MM. 9 are cc:stable and the remaining 4 address	2026-05-26 08:23:19 -07:00
pcmcia	…
ras	…
rdma	RDMA/core: Do not read wild stack memory in uverbs_get_handler_fn()	2026-05-19 19:32:48 -03:00
rv	…
scsi	…
soc	…
sound	ASoC: Fixes for v7.1	2026-04-23 09:34:28 +02:00
target	…
trace	Including fixes from Bluetooth, wireless and netfilter.	2026-05-21 14:39:12 -07:00
uapi	Miscellaneous scheduler fixes:	2026-05-08 19:42:10 -07:00
ufs	scsi: ufs: core: Fix bRefClkFreq write failure in HS-LSS mode	2026-04-21 20:58:06 -04:00
vdso	…
video	fbdev: udlfb: add vm_ops to dlfb_ops_mmap to prevent use-after-free	2026-05-04 10:35:55 +02:00
xen	xen/arm: Replace __ASSEMBLY__ with __ASSEMBLER__ in interface.h	2026-05-12 17:31:38 +02:00
Kbuild	…