mirror-linux

Commit Graph

Author	SHA1	Message	Date
Andrew Price	9126d2754c	gfs2: Don't clear sb->s_fs_info in gfs2_sys_fs_add When gfs2_sys_fs_add() fails, it sets sb->s_fs_info to NULL on its error path (see commit `0d515210b6` ("GFS2: Add kobject release method")). The intention seems to be to prevent dereferencing sb->s_fs_info once the object pointed to has been deallocated, but that would be better achieved by setting the pointer to NULL in free_sbd(). As a consequence, when the call to gfs2_sys_fs_add() fails in gfs2_fill_super(), sdp = GFS2_SB(inode) will evaluate to NULL in iput() -> gfs2_drop_inode(), and accessing sdp->sd_flags will be a NULL pointer dereference. Fix that by only setting sb->s_fs_info to NULL when actually freeing the object pointed to in free_sbd(). Fixes: `ae9f3bd825` ("gfs2: replace sd_aspace with sd_inode") Reported-by: syzbot+b12826218502df019f9d@syzkaller.appspotmail.com Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-30 19:20:20 +02:00
Linus Torvalds	8fdabcd9c0	gfs2 changes - Fix the long-standing warnings in inode_to_wb() when CONFIG_LOCKDEP is enabled: gfs2 doesn't support cgroup writeback and so inode->i_wb will never change. This is the counterpart of commit `9e888998ea` ("writeback: fix false warning in inode_to_wb()"). - Fix a hang introduced by commit `8d391972ae` ("gfs2: Remove __gfs2_writepage()"): prevent gfs2_logd from creating transactions for jdata pages while trying to flush the log. - Fix a race between gfs2_create_inode() and gfs2_evict_inode() by deallocating partially created inodes on the gfs2_create_inode() error path. - Fix a bug in the journal head lookup code that could cause mount to fail after successful recovery. - Various smaller fixes and cleanups from various people. -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmgw3W4UHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrtfQ//Zua7sZEK3BszDJ5E0yr/hj2wLNBJ Fu/ehuymgoNaDcZgM9OhRw/l3cuTsEOoxv+Qreb3eHkfNgBfq25ij4IXRKyLuxjo rTcH8PdTjp9drQ9+lhKO19qCnQ1naHem5r1mvVfsbo1aAN3Wj8Ng3qAen+RbvKq+ DN6Ox+DG+DBxKXg/KVhMyIifykbOWcQbkjdgzAIju4jhBGHmwI19qyyMBQU9gJP2 sa1vUa9ai9bhir0wHBNWyFqjvVNfPzlLZ9qetOJFF6+EF0wiDTTbHg/6ueW3wr5u 1I+ndP4RLJ0QXnoaxVozRqVnBeaW4V227Rrxie/5ZV5HusT5+Wky3VzGwaIHgy9b VX825Aff/DzgKLRblk/Rvqfnq2oVHkaNJJtBZgcliyxfKwRVSOV7AHTXSE+tzWHR DIe08FguliTWsKyPsTuLrwg5B2VTww+7FoQ7A3+L6dd6lQBN3JYsl1lxpIz3+k9N yqMiCQQc47HmTm9TJEUkcWmY2/qTRPeQnhZtPnLmv93vNrhoC2aL2ULesN80ferD iJb3gDVRsZKkCGuKALxEgBCubE5KyQi+64Rie1AHqHmR93lpINxMFjI3Wj4k9n+X LPLyT8F9prhnlITqgAMxVjbUXvz09l9aW4JOD4cpB2D7aTMBGrBx7ZMhS4MVajov DoXGzxv+FSxravk= =0EFZ -----END PGP SIGNATURE----- Merge tag 'gfs2-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix the long-standing warnings in inode_to_wb() when CONFIG_LOCKDEP is enabled: gfs2 doesn't support cgroup writeback and so inode->i_wb will never change. This is the counterpart of commit `9e888998ea` ("writeback: fix false warning in inode_to_wb()") - Fix a hang introduced by commit `8d391972ae` ("gfs2: Remove __gfs2_writepage()"): prevent gfs2_logd from creating transactions for jdata pages while trying to flush the log - Fix a race between gfs2_create_inode() and gfs2_evict_inode() by deallocating partially created inodes on the gfs2_create_inode() error path - Fix a bug in the journal head lookup code that could cause mount to fail after successful recovery - Various smaller fixes and cleanups from various people * tag 'gfs2-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (23 commits) gfs2: No more gfs2_find_jhead caching gfs2: Get rid of duplicate log head lookup gfs2: Simplify clean_journal gfs2: Simplify gfs2_log_pointers_init gfs2: Move gfs2_log_pointers_init gfs2: Minor comments fix gfs2: Don't start unnecessary transactions during log flush gfs2: Move gfs2_trans_add_databufs gfs2: Rename jdata_dirty_folio to gfs2_jdata_dirty_folio gfs2: avoid inefficient use of crc32_le_shift() gfs2: Do not call iomap_zero_range beyond eof gfs: don't check for AOP_WRITEPAGE_ACTIVATE in gfs2_write_jdata_batch gfs2: Fix usage of bio->bi_status in gfs2_end_log_write gfs2: deallocate inodes in gfs2_create_inode gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_dealloc gfs2: Move gfs2_dinode_dealloc gfs2: Don't reread inodes unnecessarily gfs2: gfs2_create_inode error handling fix gfs2: Remove unnecessary NULL check before free_percpu() gfs2: check sb_min_blocksize return value ...	2025-05-26 12:35:08 -07:00
Linus Torvalds	6f59de9bc0	for-6.16/block-20250523 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgwnGYQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpq9aD/4iqOts77xhWWLrOJWkkhOcV5rREeyppq8X MKYul9S4cc4Uin9Xou9a+nab31QBQEk3nsN3kX9o3yAXvkh6yUm36HD8qYNW/46q IUkwRQQJ0COyTnexMZQNTbZPQDIYcenXmQxOcrEJ5jC1Jcz0sOKHsgekL+ab3kCy fLnuz2ozvjGDMala/NmE8fN5qSlj4qQABHgbamwlwfo4aWu07cwfqn5G/FCYJgDO xUvsnTVclom2g4G+7eSSvGQI1QyAxl5QpviPnj/TEgfFBFnhbCSoBTEY6ecqhlfW 6u59MF/Uw8E+weiuGY4L87kDtBhjQs3UMSLxCuwH7MxXb25ff7qB4AIkcFD0kKFH 3V5NtwqlU7aQT0xOjGxaHhfPwjLD+FVss4ARmuHS09/Kn8egOW9yROPyetnuH84R Oz0Ctnt1IPLFjvGeg3+rt9fjjS9jWOXLITb9Q6nX9gnCt7orCwIYke8YCpmnJyhn i+fV4CWYIQBBRKxIT0E/GhJxZOmL0JKpomnbpP2dH8npemnsTCuvtfdrK9gfhH2X chBVqCPY8MNU5zKfzdEiavPqcm9392lMzOoOXW2pSC1eAKqnAQ86ZT3r7rLntqE8 75LxHcvaQIsnpyG+YuJVHvoiJ83TbqZNpyHwNaQTYhDmdYpp2d/wTtTQywX4DuXb Y6NDJw5+kQ== =1PNK -----END PGP SIGNATURE----- Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - ublk updates: - Add support for updating the size of a ublk instance - Zero-copy improvements - Auto-registering of buffers for zero-copy - Series simplifying and improving GET_DATA and request lookup - Series adding quiesce support - Lots of selftests additions - Various cleanups - NVMe updates via Christoph: - add per-node DMA pools and use them for PRP/SGL allocations (Caleb Sander Mateos, Keith Busch) - nvme-fcloop refcounting fixes (Daniel Wagner) - support delayed removal of the multipath node and optionally support the multipath node for private namespaces (Nilay Shroff) - support shared CQs in the PCI endpoint target code (Wilfred Mallawa) - support admin-queue only authentication (Hannes Reinecke) - use the crc32c library instead of the crypto API (Eric Biggers) - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes Reinecke, Leon Romanovsky, Gustavo A. R. Silva) - MD updates via Yu: - Fix that normal IO can be starved by sync IO, found by mkfs on newly created large raid5, with some clean up patches for bdev inflight counters - Clean up brd, getting rid of atomic kmaps and bvec poking - Add loop driver specifically for zoned IO testing - Eliminate blk-rq-qos calls with a static key, if not enabled - Improve hctx locking for when a plug has IO for multiple queues pending - Remove block layer bouncing support, which in turn means we can remove the per-node bounce stat as well - Improve blk-throttle support - Improve delay support for blk-throttle - Improve brd discard support - Unify IO scheduler switching. This should also fix a bunch of lockdep warnings we've been seeing, after enabling lockdep support for queue freezing/unfreezeing - Add support for block write streams via FDP (flexible data placement) on NVMe - Add a bunch of block helpers, facilitating the removal of a bunch of duplicated boilerplate code - Remove obsolete BLK_MQ pci and virtio Kconfig options - Add atomic/untorn write support to blktrace - Various little cleanups and fixes * tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits) selftests: ublk: add test for UBLK_F_QUIESCE ublk: add feature UBLK_F_QUIESCE selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE traceevent/block: Add REQ_ATOMIC flag to block trace events ublk: run auto buf unregisgering in same io_ring_ctx with registering io_uring: add helper io_uring_cmd_ctx_handle() ublk: remove io argument from ublk_auto_buf_reg_fallback() ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch() selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK selftests: ublk: support UBLK_F_AUTO_BUF_REG ublk: support UBLK_AUTO_BUF_REG_FALLBACK ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG ublk: prepare for supporting to register request buffer automatically ublk: convert to refcount_t selftests: ublk: make IO & device removal test more stressful nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk nvme: introduce multipath_always_on module param nvme-multipath: introduce delayed removal of the multipath head node nvme-pci: derive and better document max segments limits nvme-pci: use struct_size for allocation struct nvme_dev ...	2025-05-26 11:39:36 -07:00
Linus Torvalds	8dd53535f1	vfs-6.16-rc1.super -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTwAKCRCRxhvAZXjc oi3BAQD/IBxTbAZIe7vEAsuLlBoKbWrzPGvxzd4UeMGo6OY18wEAvvyJM+arQy51 jS0ZErDOJnPNe7jps+Gh+WDx6d3NMAY= =lqAG -----END PGP SIGNATURE----- Merge tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs freezing updates from Christian Brauner: "This contains various filesystem freezing related work for this cycle: - Allow the power subsystem to support filesystem freeze for suspend and hibernate. Now all the pieces are in place to actually allow the power subsystem to freeze/thaw filesystems during suspend/resume. Filesystems are only frozen and thawed if the power subsystem does actually own the freeze. If the filesystem is already frozen by the time we've frozen all userspace processes we don't care to freeze it again. That's userspace's job once the process resumes. We only actually freeze filesystems if we absolutely have to and we ignore other failures to freeze. We could bubble up errors and fail suspend/resume if the error isn't EBUSY (aka it's already frozen) but I don't think that this is worth it. Filesystem freezing during suspend/resume is best-effort. If the user has 500 ext4 filesystems mounted and 4 fail to freeze for whatever reason then we simply skip them. What we have now is already a big improvement and let's see how we fare with it before making our lives even harder (and uglier) than we have to. - Allow efivars to support freeze and thaw Allow efivarfs to partake to resync variable state during system hibernation and suspend. Add freeze/thaw support. This is a pretty straightforward implementation. We simply add regular freeze/thaw support for both userspace and the kernel. efivars is the first pseudofilesystem that adds support for filesystem freezing and thawing. The simplicity comes from the fact that we simply always resync variable state after efivarfs has been frozen. It doesn't matter whether that's because of suspend, userspace initiated freeze or hibernation. Efivars is simple enough that it doesn't matter that we walk all dentries. There are no directories and there aren't insane amounts of entries and both freeze/thaw are already heavy-handed operations. If userspace initiated a freeze/thaw cycle they would need CAP_SYS_ADMIN in the initial user namespace (as that's where efivarfs is mounted) so it can't be triggered by random userspace. IOW, we really really don't care" * tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: f2fs: fix freezing filesystem during resize kernfs: add warning about implementing freeze/thaw efivarfs: support freeze/thaw power: freeze filesystems during suspend/resume libfs: export find_next_child() super: add filesystem freezing helpers for suspend and hibernate gfs2: pass through holder from the VFS for freeze/thaw super: use common iterator (Part 2) super: use a common iterator (Part 1) super: skip dying superblocks early super: simplify user_get_super() super: remove pointless s_root checks fs: allow all writers to be frozen locking/percpu-rwsem: add freezable alternative to down_read	2025-05-26 09:33:44 -07:00
Andreas Gruenbacher	e320050eb7	gfs2: No more gfs2_find_jhead caching We are no longer calling gfs2_find_jhead() on the same log twice, so there is no more reason for keeping the log contents cached across those calls. In addition, log head lookup and log header writing didn't go through the same address space and so the caching wasn't even fully working, anyway. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:27 +02:00
Andreas Gruenbacher	93bd5edbd6	gfs2: Get rid of duplicate log head lookup Currently at mount time, the recovery code looks up the current log head and, if necessary, replays the log and writes a recovery header to indicate that the log is clean. It does that for each log that may need recovery. We also know that our own log will always be checked as part of that process. Then, the mount code looks up the log head of our own log again. The double log head lookup can be costly, but more importantly, it is unnecessary because we can trivially compute the position of the log head after recovery; all we need to do for that is bump the position and lh_sequence by one when writing a recovery header. With that in mind, move the call to gfs2_log_pointers_init() into gfs2_recover_func() and get rid of the double lookup in gfs2_make_fs_rw(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:27 +02:00
Andreas Gruenbacher	2ebb94ab93	gfs2: Simplify clean_journal In function clean_journal(), update @head to point at the log header that indicates successful recovery: this is where logging needs to resume. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:27 +02:00
Andreas Gruenbacher	8a43d21876	gfs2: Simplify gfs2_log_pointers_init Move the initialization of sdp->sd_log_sequence and sdp->sd_log_flush_head inside gfs2_log_pointers_init(). Use gfs2_replay_incr_blk(). Before this change, the log head lookup code in freeze_go_xmote_bh() didn't update sdp->sd_log_flush_head. This is now fixed, but the code in freeze_go_xmote_bh() appears to be pretty useless in the first place: on a frozen filesystem, the log head will not change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:27 +02:00
Andreas Gruenbacher	703a4af356	gfs2: Move gfs2_log_pointers_init Move gfs2_log_pointers_init to recovery.c: there is no need for inlining this function. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:27 +02:00
Andreas Gruenbacher	91793971f3	gfs2: Minor comments fix Commit `4082976009` ("gfs2: Convert gfs2_find_jhead() to use a folio") replaced grab_cache_page() by filemap_grab_folio(), but the comments were still referring to grab_cache_page(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:27 +02:00
Andreas Gruenbacher	5a90f8d499	gfs2: Don't start unnecessary transactions during log flush Commit `8d391972ae` ("gfs2: Remove __gfs2_writepage()") changed the log flush code in gfs2_ail1_start_one() to call aops->writepages() instead of aops->writepage(). For jdata inodes, this means that we will now try to reserve log space and start a transaction before we can determine that the pages in question have already been journaled. When this happens in the context of gfs2_logd(), it can now appear that not enough log space is available for freeing up log space, and we will lock up. Fix that by issuing journal writes directly instead of going through aops->writepages() in the log flush code. Fixes: `8d391972ae` ("gfs2: Remove __gfs2_writepage()") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:27 +02:00
Andreas Gruenbacher	d50a64e3c5	gfs2: Move gfs2_trans_add_databufs Move gfs2_trans_add_databufs() to trans.c. Pass in a glock instead of a gfs2_inode. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:27 +02:00
Andreas Gruenbacher	2f022736ee	gfs2: Rename jdata_dirty_folio to gfs2_jdata_dirty_folio Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:27 +02:00
Eric Biggers	b6ccde39b1	gfs2: avoid inefficient use of crc32_le_shift() __get_log_header() was using crc32_le_shift() to update a CRC with four zero bytes. However, this is about 5x slower than just CRC'ing four zero bytes in the normal way. Just do that instead. (We could instead make crc32_le_shift() faster on short lengths. But all its callers do just fine without it, so I'd like to just remove it.) Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:27 +02:00
Andreas Gruenbacher	87faee382d	gfs2: Do not call iomap_zero_range beyond eof Since commit `eb65540aa9` ("iomap: warn on zero range of a post-eof folio"), iomap_zero_range() warns when asked to zero a folio beyond eof. The warning triggers on the following code path: gfs2_fallocate(FALLOC_FL_PUNCH_HOLE \| FALLOC_FL_KEEP_SIZE) __gfs2_punch_hole() gfs2_block_zero_range() iomap_zero_range() In __gfs2_punch_hole(), gfs2 zeroes out partial folios at the beginning and at the end of the specified range, whether those folios are beyond eof or not. This may add folios to the page cache which are entirely beyond eof, which isn't of any use. Avoid that by truncating the range to zero out at eof. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:26 +02:00
Christoph Hellwig	e9a4af22af	gfs: don't check for AOP_WRITEPAGE_ACTIVATE in gfs2_write_jdata_batch __gfs2_jdata_write_folio can't return AOP_WRITEPAGE_ACTIVATE, so don't check for it in gfs2_write_jdata_batch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-05-22 09:12:26 +02:00
Christian Brauner	1af3331764	super: add filesystem freezing helpers for suspend and hibernate Allow the power subsystem to support filesystem freeze for suspend and hibernate. For some kernel subsystems it is paramount that they are guaranteed that they are the owner of the freeze to avoid any risk of deadlocks. This is the case for the power subsystem. Enable it to recognize whether it did actually freeze the filesystem. If userspace has 10 filesystems and suspend/hibernate manges to freeze 5 and then fails on the 6th for whatever odd reason (current or future) then power needs to undo the freeze of the first 5 filesystems. It can't just walk the list again because while it's unlikely that a new filesystem got added in the meantime it still cannot tell which filesystems the power subsystem actually managed to get a freeze reference count on that needs to be dropped during thaw. There's various ways out of this ugliness. For example, record the filesystems the power subsystem managed to freeze on a temporary list in the callbacks and then walk that list backwards during thaw to undo the freezing or make sure that the power subsystem just actually exclusively freezes things it can freeze and marking such filesystems as being owned by power for the duration of the suspend or resume cycle. I opted for the latter as that seemed the clean thing to do even if it means more code changes. If hibernation races with filesystem freezing (e.g. DM reconfiguration), then hibernation need not freeze a filesystem because it's already frozen but userspace may thaw the filesystem before hibernation actually happens. If the race happens the other way around, DM reconfiguration may unexpectedly fail with EBUSY. So allow FREEZE_EXCL to nest with other holders. An exclusive freezer cannot be undone by any of the other concurrent freezers. Link: https://lore.kernel.org/r/20250329-work-freeze-v2-6-a47af37ecc3d@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-09 12:41:02 +02:00
Christoph Hellwig	65f8e62593	gfs2: use bdev_rw_virt in gfs2_read_super Switch gfs2_read_super to allocate the superblock buffer using kmalloc which falls back to the page allocator for PAGE_SIZE allocation but gives us a kernel virtual address and then use bdev_rw_virt to perform the synchronous read into it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250507120451.4000627-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-05-07 07:31:07 -06:00
Andrew Price	0a828c3ab0	gfs2: Fix usage of bio->bi_status in gfs2_end_log_write bio->bi_status is an index into the blk_errors array, not an errno. Its __bitwise tag is cast away here, resulting in a sparse warning: fs/gfs2/lops.c:207:22: warning: cast from restricted blk_status_t We could either add __force to the cast and continue logging bi_status in the error message, or we could look up the errno in the array and log that. As sdp->sd_log_error is used as an errno in all other cases, look up the errno here for consistency. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-04-24 23:12:15 +02:00
Andreas Gruenbacher	2c63986dd3	gfs2: deallocate inodes in gfs2_create_inode When creating and destroying inodes, we are relying on the inode hash table to make sure that for a given inode number, only a single inode will exist. We then link that inode to its inode and iopen glock and let those glocks point back at the inode. However, when iget_failed() is called, the inode is removed from the inode hash table before gfs_evict_inode() is called, and uniqueness is no longer guaranteed. Commit f1046a472b70 ("gfs2: gl_object races fix") was trying to work around that problem by detaching the inode glock from the inode before calling iget_failed(), but that broke the inode deallocation code in gfs_evict_inode(). To fix that, deallocate partially created inodes in gfs2_create_inode() instead of relying on gfs_evict_inode() for doing that. This means that gfs2_evict_inode() and its helper functions will no longer see partially created inodes, and so some simplifications are possible there. Fixes: `9ffa18884c` ("gfs2: gl_object races fix") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-04-24 23:10:05 +02:00
Andreas Gruenbacher	0cc617a54d	gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_dealloc Don't check for the GIF_ALLOC_FAILED flag in gfs2_ea_dealloc() and pass that information explicitly instead. This allows for a cleaner follow-up patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-04-21 18:20:36 +02:00
Andreas Gruenbacher	bcd18105fb	gfs2: Move gfs2_dinode_dealloc Move gfs2_dinode_dealloc() and its helper gfs2_final_release_pages() from super.c to inode.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-04-21 18:20:36 +02:00
Andreas Gruenbacher	84a79ee68f	gfs2: Don't reread inodes unnecessarily In gfs2_create_inode(), we initialize the inode from scratch and then we write the result to disk. Clear the GLF_INSTANTIATE_NEEDED glock flag to indicate that the inode is up to date. Otherwise, the next time the inode glock is acquired, gfs2_instantiate() would reread the inode from disk, which isn't necessary. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-04-21 18:20:36 +02:00
Andreas Gruenbacher	af4044fd0b	gfs2: gfs2_create_inode error handling fix When gfs2_create_inode() finds a directory, make sure to return -EISDIR. Fixes: `571a4b5797` ("GFS2: bugger off early if O_CREAT open finds a directory") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-04-21 18:20:36 +02:00
Chen Ni	4023c3cbc3	gfs2: Remove unnecessary NULL check before free_percpu() free_percpu() checks for NULL pointers internally. Remove unneeded NULL check here. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-04-21 18:20:36 +02:00
Edward Adam Davis	27d2f101e7	gfs2: check sb_min_blocksize return value Check the return value of sb_min_blocksize(): it will be 0 when the requested block size is invalid. In addition, check the return value of sb_set_blocksize() as well. Reported-by: syzbot+b0018b7468b2af33b4d5@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-04-21 18:20:36 +02:00
Andreas Gruenbacher	ae9f3bd825	gfs2: replace sd_aspace with sd_inode Currently, sdp->sd_aspace and the per-inode metadata address spaces use sb->s_bdev->bd_mapping->host as their ->host; folios in those address spaces will thus appear to be on bdev rather than on gfs2 filesystems. This is a problem because gfs2 doesn't support cgroup writeback (SB_I_CGROUPWB), but bdev does. Fix that by using a "dummy" gfs2 inode as ->host in those address spaces. When coming from a folio, folio->mapping->host->i_sb will then be a gfs2 super block and the SB_I_CGROUPWB flag will not be set in sb->s_iflags. Based on a previous version from Bob Peterson from several years ago. Thanks to Tetsuo Handa, Jan Kara, and Rafael Aquini for helping figure this out. Fixes: `aaa2cacf81` ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-04-21 18:20:36 +02:00
Alexander Aring	ff22e5da42	gfs2: only apply DLM_LKF_VALBLK if sb_lvbptr is not NULL Currently, gfs2 always sets the DLM_LKF_VALBLK flag to enable lvb handling even when sb_lvbptr is NULL. This currently causes no problems because DLM ignores the DLM_LKF_VALBLK flag when sb_lvbptr is NULL, but it does violate the DLM API. Fix that by only setting DLM_LKF_VALBLK when sb_lvbptr is not NULL. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-04-21 18:20:36 +02:00
Alexander Aring	ac5ee087d3	gfs2: move msleep to sleepable context This patch moves the msleep_interruptible() out of the non-sleepable context by moving the ls->ls_recover_spin spinlock around so msleep_interruptible() will be called in a sleepable context. Cc: stable@vger.kernel.org Fixes: `4a7727725d` ("GFS2: Fix recovery issues for spectators") Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-04-21 18:20:36 +02:00
Christian Brauner	62a2175ddf	gfs2: pass through holder from the VFS for freeze/thaw The filesystem's freeze/thaw functions can be called from contexts where the holder isn't userspace but the kernel, e.g., during systemd suspend/hibernate. So pass through the freeze/thaw flags from the VFS instead of hard-coding them. Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 09:37:17 +02:00
Eric Biggers	b261d22220	lib/crc: remove CONFIG_LIBCRC32C Now that LIBCRC32C does nothing besides select CRC32, make every option that selects LIBCRC32C instead select CRC32 directly. Then remove LIBCRC32C. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250401221600.24878-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2025-04-04 11:31:42 -07:00
Linus Torvalds	ef479de65a	gfs2 changes - Fix two bugs related to locking request cancelation (locking request being retried instead of canceled; canceling the wrong locking request). - Prevent a race between inode creation and deferred delete analogous to commit `ffd1cf0443` from 6.13. This now allows to further simplify gfs2_evict_inode() without introducing mysterious problems. - When in inode delete should be verified / retried "later" but that isn't possible, skip the delete instead of carrying it out immediately. This broke in 6.13. - More folio conversions from Matthew Wilcox (plus a fix from Dan Carpenter). - Various minor fixes and cleanups. -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmfcehcUHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqfZA//XPqzf4fuS3E/SAouuHb4/MX8vmsL kQozDnCdJqYokU/AUjpwsCTIROURi4Xjfwuj6rd1u/IFDruioX93X/m9iCGH9TeE owI+qs+qQ5ZJom+KpoNuGPUw+40qlCOfIx87P3bW6xagerMiyzCdTBc7cTB6lKBi NsSShK71uMMLNYEAXJKl7koc9fD9bn143uElH8CLXlomuQkY9QPOD5r4jCJIaPu2 +RvlfF9zRYc2hYEjSh0daC4Arm1Y3B9Sin6YEIfXi/t53c5eQ1+Ttcw51t4RVBxx CSRVUUcDhCF6pof8YkJbPQVrCZqFzorisyUqMP+qE/VW8toFc6qJ9MzcMJwK0DNH aNjEK2s3qPCPU4/qM2V7J3dZMD3poJ8cHdAHFU6J5OVFems0kt8jHn8C/RV1KXm9 Cy/IWupKCMaiMIaoANrAC3xED0KOT11dHBKpYVOQhSJIRJZ+kbjdqKik13HmUAUp 2r/tlzZNG8hhfBLPCjA0Pz+pph6x/tJO1H24ooC5D24Gn83BKkS3QC/oBVok/I3Q /2g61gtVNUwIAxPDnl4IdSvvWHZeSTJRFYGRA13wGbG6I4SV9M4nS+4xrgb6D5DE dTRZiU22J+9OJQApnGi9ehOi/49yvySAyqAjVFx+LP+2tLCzj0mcvvLerkEG+V2c 3WkiUVkLpph+8BA= =yUe1 -----END PGP SIGNATURE----- Merge tag 'gfs2-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix two bugs related to locking request cancelation (locking request being retried instead of canceled; canceling the wrong locking request) - Prevent a race between inode creation and deferred delete analogous to commit `ffd1cf0443` from 6.13. This now allows to further simplify gfs2_evict_inode() without introducing mysterious problems - When in inode delete should be verified / retried "later" but that isn't possible, skip the delete instead of carrying it out immediately. This broke in 6.13 - More folio conversions from Matthew Wilcox (plus a fix from Dan Carpenter) - Various minor fixes and cleanups * tag 'gfs2-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (22 commits) gfs2: some comment clarifications gfs2: Fix a NULL vs IS_ERR() bug in gfs2_find_jhead() gfs2: Convert gfs2_meta_read_endio() to use a folio gfs2: Convert gfs2_end_log_write_bh() to work on a folio gfs2: Convert gfs2_find_jhead() to use a folio gfs2: Convert gfs2_jhead_pg_srch() to gfs2_jhead_folio_search() gfs2: Use b_folio in gfs2_check_magic() gfs2: Use b_folio in gfs2_submit_bhs() gfs2: Use b_folio in gfs2_trans_add_meta() gfs2: Use b_folio in gfs2_log_write_bh() gfs2: skip if we cannot defer delete gfs2: remove redundant warnings gfs2: minor evict fix gfs2: Prevent inode creation race (2) gfs2: Fix additional unlikely request cancelation race gfs2: Fix request cancelation bug gfs2: Check for empty queue in run_queue gfs2: Remove more dead code in add_to_queue gfs2: Replace GIF_DEFER_DELETE with GLF_DEFER_DELETE gfs2: glock holder GL_NOPID fix ...	2025-03-27 12:09:25 -07:00
Linus Torvalds	26d8e43079	vfs-6.15-rc1.async.dir -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rNwAKCRCRxhvAZXjc onBJAP9Z8Ywmlb5KQ1E3HvDmkwyY6yOSyZ9/CmbzrkCJ8ywYkQD/d9/xt0EP/O/q N8YtzXArHWt7u0YbcVpy9WK3F72BdwU= =VJgY -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs async dir updates from Christian Brauner: "This contains cleanups that fell out of the work from async directory handling: - Change kern_path_locked() and user_path_locked_at() to never return a negative dentry. This simplifies the usability of these helpers in various places - Drop d_exact_alias() from the remaining place in NFS where it is still used. This also allows us to drop the d_exact_alias() helper completely - Drop an unnecessary call to fh_update() from nfsd_create_locked() - Change i_op->mkdir() to return a struct dentry Change vfs_mkdir() to return a dentry provided by the filesystems which is hashed and positive. This allows us to reduce the number of cases where the resulting dentry is not positive to very few cases. The code in these places becomes simpler and easier to understand. - Repack DENTRY_* and LOOKUP_* flags" * tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: fix inline emphasis warning VFS: Change vfs_mkdir() to return the dentry. nfs: change mkdir inode_operation to return alternate dentry if needed. fuse: return correct dentry for ->mkdir ceph: return the correct dentry on mkdir hostfs: store inode in dentry after mkdir if possible. Change inode_operations.mkdir to return struct dentry * nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked() nfs/vfs: discard d_exact_alias() VFS: add common error checks to lookup_one_qstr_excl() VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry VFS: repack LOOKUP_ bit flags. VFS: repack DENTRY_ flags.	2025-03-24 10:47:14 -07:00
Linus Torvalds	0ec0d4ecdd	vfs-6.15-rc1.iomap -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rGwAKCRCRxhvAZXjc okVnAP9VgaYjWGzaeep/dLzWtu7C/Cg5Swl1P84Vj+SJ+hFPEAD/auzWTV0D0Ko5 5GLyUsLZehfeVDOSRqmiyt1po8iVsQo= =ANks -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iomap updates from Christian Brauner: - Allow the filesystem to submit the writeback bios. - Allow the filsystem to track completions on a per-bio bases instead of the entire I/O. - Change writeback_ops so that ->submit_bio can be done by the filesystem. - A new ANON_WRITE flag for writes that don't have a block number assigned to them at the iomap level leaving the filesystem to do that work in the submission handler. - Incremental iterator advance The folio_batch support for zero range where the filesystem provides a batch of folios to process that might not be logically continguous requires more flexibility than the current offset based iteration currently offers. Update all iomap operations to advance the iterator within the operation and thus remove the need to advance from the core iomap iterator. - Make buffered writes work with RWF_DONTCACHE If RWF_DONTCACHE is set for a write, mark the folios being written as uncached. On writeback completion the pages will be dropped. - Introduce infrastructure for large atomic writes This will eventually be used by xfs and ext4. * tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits) iomap: rework IOMAP atomic flags iomap: comment on atomic write checks in iomap_dio_bio_iter() iomap: inline iomap_dio_bio_opflags() iomap: fix inline data on buffered read iomap: Lift blocksize restriction on atomic writes iomap: Support SW-based atomic writes iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW xfs: flag as supporting FOP_DONTCACHE iomap: make buffered writes work with RWF_DONTCACHE iomap: introduce a full map advance helper iomap: rename iomap_iter processed field to status iomap: remove unnecessary advance from iomap_iter() dax: advance the iomap_iter on pte and pmd faults dax: advance the iomap_iter on dedupe range dax: advance the iomap_iter on unshare range dax: advance the iomap_iter on zero range dax: push advance down into dax_iomap_iter() for read and write dax: advance the iomap_iter in the read/write path iomap: convert misc simple ops to incremental advance iomap: advance the iter on direct I/O ...	2025-03-24 10:19:31 -07:00
Andreas Gruenbacher	8cb70b91b2	gfs2: some comment clarifications Since commit `e1fa9ea85c` ("gfs2: Stop using glock holder auto-demotion for now"), we unconditionally drop the inode glock before trying to fault in more pages. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-18 13:21:39 +01:00
Dan Carpenter	951d701ef1	gfs2: Fix a NULL vs IS_ERR() bug in gfs2_find_jhead() The filemap_grab_folio() function doesn't return NULL, it returns error pointers. Fix the check to match. Fixes: `4082976009` ("gfs2: Convert gfs2_find_jhead() to use a folio") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-12 13:00:28 +01:00
Matthew Wilcox (Oracle)	0776a508d1	gfs2: Convert gfs2_meta_read_endio() to use a folio Switch from bio_for_each_segment_all() to bio_for_each_folio_all() which removes a call to page_buffers(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle)	536da2a440	gfs2: Convert gfs2_end_log_write_bh() to work on a folio gfs2_end_log_write() has to handle bios which consist of both pages which belong to folios and pages which were allocated from a mempool and do not belong to a folio. It would be cleaner to have separate endio handlers which handle each type, but it's not clear to me whether that's even possible. This patch is slightly forward-looking in that page_folio() cannot currently return NULL, but it will return NULL in the future for pages which do not belong to a folio. This was the last user of page_has_buffers(), so remove it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle)	4082976009	gfs2: Convert gfs2_find_jhead() to use a folio Remove a call to grab_cache_page() by using a folio throughout this function. [agruenba@redhat.com: Adjust to return value difference between bio_add_page() and bio_add_folio().] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle)	e00307e8d4	gfs2: Convert gfs2_jhead_pg_srch() to gfs2_jhead_folio_search() Pass in the folio instead of the page. Add an assert that this is not a large folio as we'd need a more complex solution if we wanted to kmap() each page out of a large folio. Removes a use of folio->page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> [agruenba@redhat.com: Rename gfs2_jhead_folio_srch() to gfs2_jhead_folio_search().] Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle)	e6ff5f2089	gfs2: Use b_folio in gfs2_check_magic() We are preparing to remove bh->b_page. Use kmap_local_folio() instead of kmap_local_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle)	072d732c05	gfs2: Use b_folio in gfs2_submit_bhs() Remove a reference to bh->b_page which is going to be removed soon. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle)	3f2fc848be	gfs2: Use b_folio in gfs2_trans_add_meta() The lock bit is maintained on the folio, not on the page. Saves two calls to compound_head() as well as removing two references to bh->b_page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle)	6576742b90	gfs2: Use b_folio in gfs2_log_write_bh() We are preparing to remove bh->b_page. gfs2_log_write() should continue to operate on pages as some of the memory being logged does not come from folios, so convert from folio to page in this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:39 +01:00
Andreas Gruenbacher	41a8e04c94	gfs2: skip if we cannot defer delete In gfs2_evict_inode(), in the unlikely case that we cannot defer deleting the inode, it is not safe to fall back to deleting the inode; the only valid choice we have is to skip the delete. In addition, in evict_should_delete(), if we cannot lock the inode glock exclusively, we are in a bad enough state that skipping the delete is likely a better choice than trying to recover from the failure later. Fixes: `c5b7a2400e` ("gfs2: Only defer deletes when we have an iopen glock") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:38 +01:00
Andreas Gruenbacher	79fe790a32	gfs2: remove redundant warnings In glock_set_object() and glock_clear_object(), there is no need to print the glock type and number when we dump the entire glock, anyway. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:38 +01:00
Andreas Gruenbacher	e9e38ed725	gfs2: minor evict fix In evict_should_delete(), when gfs2_upgrade_iopen_glock() fails, we detach the iopen glock from the inode without calling glock_clear_object(). This leads to a warning in glock_set_object() when the same inode is recreated and the glock is reused. Fix that by only detaching the iopen glock in gfs2_evict_inode(). In addition, remove the dequeue code from evict_should_delete(); we already perform a conditional dequeue in gfs2_evict_inode(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:38 +01:00
Andreas Gruenbacher	9136cad723	gfs2: Prevent inode creation race (2) In gfs2_try_evict(), we try grabbing the inode to evict, we try to evict it, and then we try grabbing it again to see if it still exists. There is no guarantee that we will end up with the same inode both times; the inode validity check that commit `ffd1cf0443` ("gfs2: Prevent inode creation race") added to the first grab is actually needed both times. (To avoid code duplication, add a grab_existing_inode() helper.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:38 +01:00
Andreas Gruenbacher	6cb3b1c2df	gfs2: Fix additional unlikely request cancelation race In gfs2_glock_dq(), we must drop the glock spin lock before calling ->lm_cancel, but this means that in the meantime, the operation we are trying to cancel could complete. If the operation completes unsuccessfully, another holder can end up at the head of the queue and another ->lm_lock operation can get started. In this case, we would end up canceling that second operation by accident. To prevent that, introduce a new GLF_CANCELING flag. Set that flag in gfs2_glock_dq() when trying to cancel an operation. When seeing that flag, finish_xmote() will then keep the GLF_LOCK flag set to prevent other glock operations from taking place. gfs2_glock_dq() then completes the cancelation attempt by clearing GLF_LOCK and GLF_CANCELING. In addition, add a missing GLF_DEMOTE_IN_PROGRESS check in gfs2_glock_dq() to make sure that we won't accidentally cancel a demote request. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:38 +01:00
Andreas Gruenbacher	a431d49243	gfs2: Fix request cancelation bug In finish_xmote(), when a locking request is canceled, the corresponding holder is moved to the tail of the holders list instead of being dequeued immediately. When there is only a single holder, the canceled locking request is then immediately repeated. This makes no sense; it looks like another remnant of LM_FLAG_PRIORITY support. Instead, dequeue canceled holders and proceed with the next holder in finish_xmote(). We can then easily detect in gfs2_glock_dq() when a holder has been canceled. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2025-03-10 18:15:38 +01:00

1 2 3 4 5 ...

3050 Commits (f9db1fc56281b96fe8748632b3894de970a8a850)