Commit Graph

3050 Commits (f9db1fc56281b96fe8748632b3894de970a8a850)

Author SHA1 Message Date
Andrew Price 9126d2754c gfs2: Don't clear sb->s_fs_info in gfs2_sys_fs_add
When gfs2_sys_fs_add() fails, it sets sb->s_fs_info to NULL on its error
path (see commit 0d515210b6 ("GFS2: Add kobject release method")).
The intention seems to be to prevent dereferencing sb->s_fs_info once
the object pointed to has been deallocated, but that would be better
achieved by setting the pointer to NULL in free_sbd().

As a consequence, when the call to gfs2_sys_fs_add() fails in
gfs2_fill_super(), sdp = GFS2_SB(inode) will evaluate to NULL in iput()
-> gfs2_drop_inode(), and accessing sdp->sd_flags will be a NULL pointer
dereference.

Fix that by only setting sb->s_fs_info to NULL when actually freeing the
object pointed to in free_sbd().

Fixes: ae9f3bd825 ("gfs2: replace sd_aspace with sd_inode")
Reported-by: syzbot+b12826218502df019f9d@syzkaller.appspotmail.com
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-30 19:20:20 +02:00
Linus Torvalds 8fdabcd9c0 gfs2 changes
- Fix the long-standing warnings in inode_to_wb() when CONFIG_LOCKDEP is
   enabled: gfs2 doesn't support cgroup writeback and so inode->i_wb will never
   change.  This is the counterpart of commit 9e888998ea ("writeback: fix
   false warning in inode_to_wb()").
 
 - Fix a hang introduced by commit 8d391972ae ("gfs2: Remove
   __gfs2_writepage()"): prevent gfs2_logd from creating transactions for jdata
   pages while trying to flush the log.
 
 - Fix a race between gfs2_create_inode() and gfs2_evict_inode() by deallocating
   partially created inodes on the gfs2_create_inode() error path.
 
 - Fix a bug in the journal head lookup code that could cause mount to fail
   after successful recovery.
 
 - Various smaller fixes and cleanups from various people.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmgw3W4UHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrtfQ//Zua7sZEK3BszDJ5E0yr/hj2wLNBJ
 Fu/ehuymgoNaDcZgM9OhRw/l3cuTsEOoxv+Qreb3eHkfNgBfq25ij4IXRKyLuxjo
 rTcH8PdTjp9drQ9+lhKO19qCnQ1naHem5r1mvVfsbo1aAN3Wj8Ng3qAen+RbvKq+
 DN6Ox+DG+DBxKXg/KVhMyIifykbOWcQbkjdgzAIju4jhBGHmwI19qyyMBQU9gJP2
 sa1vUa9ai9bhir0wHBNWyFqjvVNfPzlLZ9qetOJFF6+EF0wiDTTbHg/6ueW3wr5u
 1I+ndP4RLJ0QXnoaxVozRqVnBeaW4V227Rrxie/5ZV5HusT5+Wky3VzGwaIHgy9b
 VX825Aff/DzgKLRblk/Rvqfnq2oVHkaNJJtBZgcliyxfKwRVSOV7AHTXSE+tzWHR
 DIe08FguliTWsKyPsTuLrwg5B2VTww+7FoQ7A3+L6dd6lQBN3JYsl1lxpIz3+k9N
 yqMiCQQc47HmTm9TJEUkcWmY2/qTRPeQnhZtPnLmv93vNrhoC2aL2ULesN80ferD
 iJb3gDVRsZKkCGuKALxEgBCubE5KyQi+64Rie1AHqHmR93lpINxMFjI3Wj4k9n+X
 LPLyT8F9prhnlITqgAMxVjbUXvz09l9aW4JOD4cpB2D7aTMBGrBx7ZMhS4MVajov
 DoXGzxv+FSxravk=
 =0EFZ
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Fix the long-standing warnings in inode_to_wb() when CONFIG_LOCKDEP
   is enabled: gfs2 doesn't support cgroup writeback and so inode->i_wb
   will never change. This is the counterpart of commit 9e888998ea
   ("writeback: fix false warning in inode_to_wb()")

 - Fix a hang introduced by commit 8d391972ae ("gfs2: Remove
   __gfs2_writepage()"): prevent gfs2_logd from creating transactions
   for jdata pages while trying to flush the log

 - Fix a race between gfs2_create_inode() and gfs2_evict_inode() by
   deallocating partially created inodes on the gfs2_create_inode()
   error path

 - Fix a bug in the journal head lookup code that could cause mount to
   fail after successful recovery

 - Various smaller fixes and cleanups from various people

* tag 'gfs2-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (23 commits)
  gfs2: No more gfs2_find_jhead caching
  gfs2: Get rid of duplicate log head lookup
  gfs2: Simplify clean_journal
  gfs2: Simplify gfs2_log_pointers_init
  gfs2: Move gfs2_log_pointers_init
  gfs2: Minor comments fix
  gfs2: Don't start unnecessary transactions during log flush
  gfs2: Move gfs2_trans_add_databufs
  gfs2: Rename jdata_dirty_folio to gfs2_jdata_dirty_folio
  gfs2: avoid inefficient use of crc32_le_shift()
  gfs2: Do not call iomap_zero_range beyond eof
  gfs: don't check for AOP_WRITEPAGE_ACTIVATE in gfs2_write_jdata_batch
  gfs2: Fix usage of bio->bi_status in gfs2_end_log_write
  gfs2: deallocate inodes in gfs2_create_inode
  gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_dealloc
  gfs2: Move gfs2_dinode_dealloc
  gfs2: Don't reread inodes unnecessarily
  gfs2: gfs2_create_inode error handling fix
  gfs2: Remove unnecessary NULL check before free_percpu()
  gfs2: check sb_min_blocksize return value
  ...
2025-05-26 12:35:08 -07:00
Linus Torvalds 6f59de9bc0 for-6.16/block-20250523
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgwnGYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpq9aD/4iqOts77xhWWLrOJWkkhOcV5rREeyppq8X
 MKYul9S4cc4Uin9Xou9a+nab31QBQEk3nsN3kX9o3yAXvkh6yUm36HD8qYNW/46q
 IUkwRQQJ0COyTnexMZQNTbZPQDIYcenXmQxOcrEJ5jC1Jcz0sOKHsgekL+ab3kCy
 fLnuz2ozvjGDMala/NmE8fN5qSlj4qQABHgbamwlwfo4aWu07cwfqn5G/FCYJgDO
 xUvsnTVclom2g4G+7eSSvGQI1QyAxl5QpviPnj/TEgfFBFnhbCSoBTEY6ecqhlfW
 6u59MF/Uw8E+weiuGY4L87kDtBhjQs3UMSLxCuwH7MxXb25ff7qB4AIkcFD0kKFH
 3V5NtwqlU7aQT0xOjGxaHhfPwjLD+FVss4ARmuHS09/Kn8egOW9yROPyetnuH84R
 Oz0Ctnt1IPLFjvGeg3+rt9fjjS9jWOXLITb9Q6nX9gnCt7orCwIYke8YCpmnJyhn
 i+fV4CWYIQBBRKxIT0E/GhJxZOmL0JKpomnbpP2dH8npemnsTCuvtfdrK9gfhH2X
 chBVqCPY8MNU5zKfzdEiavPqcm9392lMzOoOXW2pSC1eAKqnAQ86ZT3r7rLntqE8
 75LxHcvaQIsnpyG+YuJVHvoiJ83TbqZNpyHwNaQTYhDmdYpp2d/wTtTQywX4DuXb
 Y6NDJw5+kQ==
 =1PNK
 -----END PGP SIGNATURE-----

Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - ublk updates:
      - Add support for updating the size of a ublk instance
      - Zero-copy improvements
      - Auto-registering of buffers for zero-copy
      - Series simplifying and improving GET_DATA and request lookup
      - Series adding quiesce support
      - Lots of selftests additions
      - Various cleanups

 - NVMe updates via Christoph:
      - add per-node DMA pools and use them for PRP/SGL allocations
        (Caleb Sander Mateos, Keith Busch)
      - nvme-fcloop refcounting fixes (Daniel Wagner)
      - support delayed removal of the multipath node and optionally
        support the multipath node for private namespaces (Nilay Shroff)
      - support shared CQs in the PCI endpoint target code (Wilfred
        Mallawa)
      - support admin-queue only authentication (Hannes Reinecke)
      - use the crc32c library instead of the crypto API (Eric Biggers)
      - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes
        Reinecke, Leon Romanovsky, Gustavo A. R. Silva)

 - MD updates via Yu:
      - Fix that normal IO can be starved by sync IO, found by mkfs on
        newly created large raid5, with some clean up patches for bdev
        inflight counters

 - Clean up brd, getting rid of atomic kmaps and bvec poking

 - Add loop driver specifically for zoned IO testing

 - Eliminate blk-rq-qos calls with a static key, if not enabled

 - Improve hctx locking for when a plug has IO for multiple queues
   pending

 - Remove block layer bouncing support, which in turn means we can
   remove the per-node bounce stat as well

 - Improve blk-throttle support

 - Improve delay support for blk-throttle

 - Improve brd discard support

 - Unify IO scheduler switching. This should also fix a bunch of lockdep
   warnings we've been seeing, after enabling lockdep support for queue
   freezing/unfreezeing

 - Add support for block write streams via FDP (flexible data placement)
   on NVMe

 - Add a bunch of block helpers, facilitating the removal of a bunch of
   duplicated boilerplate code

 - Remove obsolete BLK_MQ pci and virtio Kconfig options

 - Add atomic/untorn write support to blktrace

 - Various little cleanups and fixes

* tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits)
  selftests: ublk: add test for UBLK_F_QUIESCE
  ublk: add feature UBLK_F_QUIESCE
  selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
  traceevent/block: Add REQ_ATOMIC flag to block trace events
  ublk: run auto buf unregisgering in same io_ring_ctx with registering
  io_uring: add helper io_uring_cmd_ctx_handle()
  ublk: remove io argument from ublk_auto_buf_reg_fallback()
  ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch()
  selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK
  selftests: ublk: support UBLK_F_AUTO_BUF_REG
  ublk: support UBLK_AUTO_BUF_REG_FALLBACK
  ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG
  ublk: prepare for supporting to register request buffer automatically
  ublk: convert to refcount_t
  selftests: ublk: make IO & device removal test more stressful
  nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk
  nvme: introduce multipath_always_on module param
  nvme-multipath: introduce delayed removal of the multipath head node
  nvme-pci: derive and better document max segments limits
  nvme-pci: use struct_size for allocation struct nvme_dev
  ...
2025-05-26 11:39:36 -07:00
Linus Torvalds 8dd53535f1 vfs-6.16-rc1.super
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTwAKCRCRxhvAZXjc
 oi3BAQD/IBxTbAZIe7vEAsuLlBoKbWrzPGvxzd4UeMGo6OY18wEAvvyJM+arQy51
 jS0ZErDOJnPNe7jps+Gh+WDx6d3NMAY=
 =lqAG
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs freezing updates from Christian Brauner:
 "This contains various filesystem freezing related work for this cycle:

   - Allow the power subsystem to support filesystem freeze for suspend
     and hibernate.

     Now all the pieces are in place to actually allow the power
     subsystem to freeze/thaw filesystems during suspend/resume.
     Filesystems are only frozen and thawed if the power subsystem does
     actually own the freeze.

     If the filesystem is already frozen by the time we've frozen all
     userspace processes we don't care to freeze it again. That's
     userspace's job once the process resumes. We only actually freeze
     filesystems if we absolutely have to and we ignore other failures
     to freeze.

     We could bubble up errors and fail suspend/resume if the error
     isn't EBUSY (aka it's already frozen) but I don't think that this
     is worth it. Filesystem freezing during suspend/resume is
     best-effort. If the user has 500 ext4 filesystems mounted and 4
     fail to freeze for whatever reason then we simply skip them.

     What we have now is already a big improvement and let's see how we
     fare with it before making our lives even harder (and uglier) than
     we have to.

   - Allow efivars to support freeze and thaw

     Allow efivarfs to partake to resync variable state during system
     hibernation and suspend. Add freeze/thaw support.

     This is a pretty straightforward implementation. We simply add
     regular freeze/thaw support for both userspace and the kernel.
     efivars is the first pseudofilesystem that adds support for
     filesystem freezing and thawing.

     The simplicity comes from the fact that we simply always resync
     variable state after efivarfs has been frozen. It doesn't matter
     whether that's because of suspend, userspace initiated freeze or
     hibernation. Efivars is simple enough that it doesn't matter that
     we walk all dentries. There are no directories and there aren't
     insane amounts of entries and both freeze/thaw are already
     heavy-handed operations. If userspace initiated a freeze/thaw cycle
     they would need CAP_SYS_ADMIN in the initial user namespace (as
     that's where efivarfs is mounted) so it can't be triggered by
     random userspace. IOW, we really really don't care"

* tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  f2fs: fix freezing filesystem during resize
  kernfs: add warning about implementing freeze/thaw
  efivarfs: support freeze/thaw
  power: freeze filesystems during suspend/resume
  libfs: export find_next_child()
  super: add filesystem freezing helpers for suspend and hibernate
  gfs2: pass through holder from the VFS for freeze/thaw
  super: use common iterator (Part 2)
  super: use a common iterator (Part 1)
  super: skip dying superblocks early
  super: simplify user_get_super()
  super: remove pointless s_root checks
  fs: allow all writers to be frozen
  locking/percpu-rwsem: add freezable alternative to down_read
2025-05-26 09:33:44 -07:00
Andreas Gruenbacher e320050eb7 gfs2: No more gfs2_find_jhead caching
We are no longer calling gfs2_find_jhead() on the same log twice, so
there is no more reason for keeping the log contents cached across those
calls.  In addition, log head lookup and log header writing didn't go
through the same address space and so the caching wasn't even fully
working, anyway.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:27 +02:00
Andreas Gruenbacher 93bd5edbd6 gfs2: Get rid of duplicate log head lookup
Currently at mount time, the recovery code looks up the current log head
and, if necessary, replays the log and writes a recovery header to
indicate that the log is clean.  It does that for each log that may need
recovery.  We also know that our own log will always be checked as part
of that process.  Then, the mount code looks up the log head of our own
log again.

The double log head lookup can be costly, but more importantly, it is
unnecessary because we can trivially compute the position of the log
head after recovery; all we need to do for that is bump the position and
lh_sequence by one when writing a recovery header.

With that in mind, move the call to gfs2_log_pointers_init() into
gfs2_recover_func() and get rid of the double lookup in
gfs2_make_fs_rw().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:27 +02:00
Andreas Gruenbacher 2ebb94ab93 gfs2: Simplify clean_journal
In function clean_journal(), update @head to point at the log header
that indicates successful recovery:  this is where logging needs to
resume.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:27 +02:00
Andreas Gruenbacher 8a43d21876 gfs2: Simplify gfs2_log_pointers_init
Move the initialization of sdp->sd_log_sequence and
sdp->sd_log_flush_head inside gfs2_log_pointers_init().  Use
gfs2_replay_incr_blk().

Before this change, the log head lookup code in freeze_go_xmote_bh()
didn't update sdp->sd_log_flush_head.  This is now fixed, but the code
in freeze_go_xmote_bh() appears to be pretty useless in the first place:
on a frozen filesystem, the log head will not change.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:27 +02:00
Andreas Gruenbacher 703a4af356 gfs2: Move gfs2_log_pointers_init
Move gfs2_log_pointers_init to recovery.c: there is no need for inlining
this function.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:27 +02:00
Andreas Gruenbacher 91793971f3 gfs2: Minor comments fix
Commit 4082976009 ("gfs2: Convert gfs2_find_jhead() to use a folio")
replaced grab_cache_page() by filemap_grab_folio(), but the comments
were still referring to grab_cache_page().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:27 +02:00
Andreas Gruenbacher 5a90f8d499 gfs2: Don't start unnecessary transactions during log flush
Commit 8d391972ae ("gfs2: Remove __gfs2_writepage()") changed the log
flush code in gfs2_ail1_start_one() to call aops->writepages() instead
of aops->writepage().  For jdata inodes, this means that we will now try
to reserve log space and start a transaction before we can determine
that the pages in question have already been journaled.  When this
happens in the context of gfs2_logd(), it can now appear that not enough
log space is available for freeing up log space, and we will lock up.

Fix that by issuing journal writes directly instead of going through
aops->writepages() in the log flush code.

Fixes: 8d391972ae ("gfs2: Remove __gfs2_writepage()")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:27 +02:00
Andreas Gruenbacher d50a64e3c5 gfs2: Move gfs2_trans_add_databufs
Move gfs2_trans_add_databufs() to trans.c.  Pass in a glock instead of
a gfs2_inode.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:27 +02:00
Andreas Gruenbacher 2f022736ee gfs2: Rename jdata_dirty_folio to gfs2_jdata_dirty_folio
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:27 +02:00
Eric Biggers b6ccde39b1 gfs2: avoid inefficient use of crc32_le_shift()
__get_log_header() was using crc32_le_shift() to update a CRC with four
zero bytes.  However, this is about 5x slower than just CRC'ing four
zero bytes in the normal way.  Just do that instead.

(We could instead make crc32_le_shift() faster on short lengths.  But
all its callers do just fine without it, so I'd like to just remove it.)

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:27 +02:00
Andreas Gruenbacher 87faee382d gfs2: Do not call iomap_zero_range beyond eof
Since commit eb65540aa9 ("iomap: warn on zero range of a post-eof
folio"), iomap_zero_range() warns when asked to zero a folio beyond eof.
The warning triggers on the following code path:

  gfs2_fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
    __gfs2_punch_hole()
      gfs2_block_zero_range()
        iomap_zero_range()

In __gfs2_punch_hole(), gfs2 zeroes out partial folios at the beginning
and at the end of the specified range, whether those folios are beyond
eof or not.  This may add folios to the page cache which are entirely
beyond eof, which isn't of any use.  Avoid that by truncating the range
to zero out at eof.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:26 +02:00
Christoph Hellwig e9a4af22af gfs: don't check for AOP_WRITEPAGE_ACTIVATE in gfs2_write_jdata_batch
__gfs2_jdata_write_folio can't return AOP_WRITEPAGE_ACTIVATE, so don't
check for it in gfs2_write_jdata_batch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22 09:12:26 +02:00
Christian Brauner 1af3331764
super: add filesystem freezing helpers for suspend and hibernate
Allow the power subsystem to support filesystem freeze for
suspend and hibernate.

For some kernel subsystems it is paramount that they are guaranteed that
they are the owner of the freeze to avoid any risk of deadlocks. This is
the case for the power subsystem. Enable it to recognize whether it did
actually freeze the filesystem.

If userspace has 10 filesystems and suspend/hibernate manges to freeze 5
and then fails on the 6th for whatever odd reason (current or future)
then power needs to undo the freeze of the first 5 filesystems. It can't
just walk the list again because while it's unlikely that a new
filesystem got added in the meantime it still cannot tell which
filesystems the power subsystem actually managed to get a freeze
reference count on that needs to be dropped during thaw.

There's various ways out of this ugliness. For example, record the
filesystems the power subsystem managed to freeze on a temporary list in
the callbacks and then walk that list backwards during thaw to undo the
freezing or make sure that the power subsystem just actually exclusively
freezes things it can freeze and marking such filesystems as being owned
by power for the duration of the suspend or resume cycle. I opted for
the latter as that seemed the clean thing to do even if it means more
code changes.

If hibernation races with filesystem freezing (e.g. DM reconfiguration),
then hibernation need not freeze a filesystem because it's already
frozen but userspace may thaw the filesystem before hibernation actually
happens.

If the race happens the other way around, DM reconfiguration may
unexpectedly fail with EBUSY.

So allow FREEZE_EXCL to nest with other holders. An exclusive freezer
cannot be undone by any of the other concurrent freezers.

Link: https://lore.kernel.org/r/20250329-work-freeze-v2-6-a47af37ecc3d@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09 12:41:02 +02:00
Christoph Hellwig 65f8e62593 gfs2: use bdev_rw_virt in gfs2_read_super
Switch gfs2_read_super to allocate the superblock buffer using kmalloc
which falls back to the page allocator for PAGE_SIZE allocation but
gives us a kernel virtual address and then use bdev_rw_virt to perform
the synchronous read into it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250507120451.4000627-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-07 07:31:07 -06:00
Andrew Price 0a828c3ab0 gfs2: Fix usage of bio->bi_status in gfs2_end_log_write
bio->bi_status is an index into the blk_errors array, not an errno. Its
__bitwise tag is cast away here, resulting in a sparse warning:

  fs/gfs2/lops.c:207:22: warning: cast from restricted blk_status_t

We could either add __force to the cast and continue logging bi_status
in the error message, or we could look up the errno in the array and log
that. As sdp->sd_log_error is used as an errno in all other cases, look
up the errno here for consistency.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-24 23:12:15 +02:00
Andreas Gruenbacher 2c63986dd3 gfs2: deallocate inodes in gfs2_create_inode
When creating and destroying inodes, we are relying on the inode hash
table to make sure that for a given inode number, only a single inode
will exist.  We then link that inode to its inode and iopen glock and
let those glocks point back at the inode.  However, when iget_failed()
is called, the inode is removed from the inode hash table before
gfs_evict_inode() is called, and uniqueness is no longer guaranteed.

Commit f1046a472b70 ("gfs2: gl_object races fix") was trying to work
around that problem by detaching the inode glock from the inode before
calling iget_failed(), but that broke the inode deallocation code in
gfs_evict_inode().

To fix that, deallocate partially created inodes in gfs2_create_inode()
instead of relying on gfs_evict_inode() for doing that.

This means that gfs2_evict_inode() and its helper functions will no
longer see partially created inodes, and so some simplifications are
possible there.

Fixes: 9ffa18884c ("gfs2: gl_object races fix")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-24 23:10:05 +02:00
Andreas Gruenbacher 0cc617a54d gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_dealloc
Don't check for the GIF_ALLOC_FAILED flag in gfs2_ea_dealloc() and pass
that information explicitly instead.  This allows for a cleaner
follow-up patch.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21 18:20:36 +02:00
Andreas Gruenbacher bcd18105fb gfs2: Move gfs2_dinode_dealloc
Move gfs2_dinode_dealloc() and its helper gfs2_final_release_pages()
from super.c to inode.c.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21 18:20:36 +02:00
Andreas Gruenbacher 84a79ee68f gfs2: Don't reread inodes unnecessarily
In gfs2_create_inode(), we initialize the inode from scratch and then we
write the result to disk.  Clear the GLF_INSTANTIATE_NEEDED glock flag
to indicate that the inode is up to date.  Otherwise, the next time the
inode glock is acquired, gfs2_instantiate() would reread the inode from
disk, which isn't necessary.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21 18:20:36 +02:00
Andreas Gruenbacher af4044fd0b gfs2: gfs2_create_inode error handling fix
When gfs2_create_inode() finds a directory, make sure to return -EISDIR.

Fixes: 571a4b5797 ("GFS2: bugger off early if O_CREAT open finds a directory")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21 18:20:36 +02:00
Chen Ni 4023c3cbc3 gfs2: Remove unnecessary NULL check before free_percpu()
free_percpu() checks for NULL pointers internally.
Remove unneeded NULL check here.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21 18:20:36 +02:00
Edward Adam Davis 27d2f101e7 gfs2: check sb_min_blocksize return value
Check the return value of sb_min_blocksize(): it will be 0 when the
requested block size is invalid.

In addition, check the return value of sb_set_blocksize() as well.

Reported-by: syzbot+b0018b7468b2af33b4d5@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21 18:20:36 +02:00
Andreas Gruenbacher ae9f3bd825 gfs2: replace sd_aspace with sd_inode
Currently, sdp->sd_aspace and the per-inode metadata address spaces use
sb->s_bdev->bd_mapping->host as their ->host; folios in those address
spaces will thus appear to be on bdev rather than on gfs2 filesystems.
This is a problem because gfs2 doesn't support cgroup writeback
(SB_I_CGROUPWB), but bdev does.

Fix that by using a "dummy" gfs2 inode as ->host in those address
spaces.  When coming from a folio, folio->mapping->host->i_sb will then
be a gfs2 super block and the SB_I_CGROUPWB flag will not be set in
sb->s_iflags.

Based on a previous version from Bob Peterson from several years ago.
Thanks to Tetsuo Handa, Jan Kara, and Rafael Aquini for helping figure
this out.

Fixes: aaa2cacf81 ("writeback: add lockdep annotation to inode_to_wb()")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21 18:20:36 +02:00
Alexander Aring ff22e5da42 gfs2: only apply DLM_LKF_VALBLK if sb_lvbptr is not NULL
Currently, gfs2 always sets the DLM_LKF_VALBLK flag to enable lvb
handling even when sb_lvbptr is NULL.  This currently causes no problems
because DLM ignores the DLM_LKF_VALBLK flag when sb_lvbptr is NULL, but
it does violate the DLM API.  Fix that by only setting DLM_LKF_VALBLK
when sb_lvbptr is not NULL.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21 18:20:36 +02:00
Alexander Aring ac5ee087d3 gfs2: move msleep to sleepable context
This patch moves the msleep_interruptible() out of the non-sleepable
context by moving the ls->ls_recover_spin spinlock around so
msleep_interruptible() will be called in a sleepable context.

Cc: stable@vger.kernel.org
Fixes: 4a7727725d ("GFS2: Fix recovery issues for spectators")
Suggested-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21 18:20:36 +02:00
Christian Brauner 62a2175ddf
gfs2: pass through holder from the VFS for freeze/thaw
The filesystem's freeze/thaw functions can be called from contexts where
the holder isn't userspace but the kernel, e.g., during systemd
suspend/hibernate. So pass through the freeze/thaw flags from the VFS
instead of hard-coding them.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 09:37:17 +02:00
Eric Biggers b261d22220 lib/crc: remove CONFIG_LIBCRC32C
Now that LIBCRC32C does nothing besides select CRC32, make every option
that selects LIBCRC32C instead select CRC32 directly.  Then remove
LIBCRC32C.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250401221600.24878-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-04-04 11:31:42 -07:00
Linus Torvalds ef479de65a gfs2 changes
- Fix two bugs related to locking request cancelation (locking request
   being retried instead of canceled; canceling the wrong locking
   request).
 
 - Prevent a race between inode creation and deferred delete analogous
   to commit ffd1cf0443 from 6.13.  This now allows to further simplify
   gfs2_evict_inode() without introducing mysterious problems.
 
 - When in inode delete should be verified / retried "later" but that
   isn't possible, skip the delete instead of carrying it out
   immediately.  This broke in 6.13.
 
 - More folio conversions from Matthew Wilcox (plus a fix from Dan
   Carpenter).
 
 - Various minor fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmfcehcUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqfZA//XPqzf4fuS3E/SAouuHb4/MX8vmsL
 kQozDnCdJqYokU/AUjpwsCTIROURi4Xjfwuj6rd1u/IFDruioX93X/m9iCGH9TeE
 owI+qs+qQ5ZJom+KpoNuGPUw+40qlCOfIx87P3bW6xagerMiyzCdTBc7cTB6lKBi
 NsSShK71uMMLNYEAXJKl7koc9fD9bn143uElH8CLXlomuQkY9QPOD5r4jCJIaPu2
 +RvlfF9zRYc2hYEjSh0daC4Arm1Y3B9Sin6YEIfXi/t53c5eQ1+Ttcw51t4RVBxx
 CSRVUUcDhCF6pof8YkJbPQVrCZqFzorisyUqMP+qE/VW8toFc6qJ9MzcMJwK0DNH
 aNjEK2s3qPCPU4/qM2V7J3dZMD3poJ8cHdAHFU6J5OVFems0kt8jHn8C/RV1KXm9
 Cy/IWupKCMaiMIaoANrAC3xED0KOT11dHBKpYVOQhSJIRJZ+kbjdqKik13HmUAUp
 2r/tlzZNG8hhfBLPCjA0Pz+pph6x/tJO1H24ooC5D24Gn83BKkS3QC/oBVok/I3Q
 /2g61gtVNUwIAxPDnl4IdSvvWHZeSTJRFYGRA13wGbG6I4SV9M4nS+4xrgb6D5DE
 dTRZiU22J+9OJQApnGi9ehOi/49yvySAyqAjVFx+LP+2tLCzj0mcvvLerkEG+V2c
 3WkiUVkLpph+8BA=
 =yUe1
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Fix two bugs related to locking request cancelation (locking request
   being retried instead of canceled; canceling the wrong locking
   request)

 - Prevent a race between inode creation and deferred delete analogous
   to commit ffd1cf0443 from 6.13. This now allows to further simplify
   gfs2_evict_inode() without introducing mysterious problems

 - When in inode delete should be verified / retried "later" but that
   isn't possible, skip the delete instead of carrying it out
   immediately. This broke in 6.13

 - More folio conversions from Matthew Wilcox (plus a fix from Dan
   Carpenter)

 - Various minor fixes and cleanups

* tag 'gfs2-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (22 commits)
  gfs2: some comment clarifications
  gfs2: Fix a NULL vs IS_ERR() bug in gfs2_find_jhead()
  gfs2: Convert gfs2_meta_read_endio() to use a folio
  gfs2: Convert gfs2_end_log_write_bh() to work on a folio
  gfs2: Convert gfs2_find_jhead() to use a folio
  gfs2: Convert gfs2_jhead_pg_srch() to gfs2_jhead_folio_search()
  gfs2: Use b_folio in gfs2_check_magic()
  gfs2: Use b_folio in gfs2_submit_bhs()
  gfs2: Use b_folio in gfs2_trans_add_meta()
  gfs2: Use b_folio in gfs2_log_write_bh()
  gfs2: skip if we cannot defer delete
  gfs2: remove redundant warnings
  gfs2: minor evict fix
  gfs2: Prevent inode creation race (2)
  gfs2: Fix additional unlikely request cancelation race
  gfs2: Fix request cancelation bug
  gfs2: Check for empty queue in run_queue
  gfs2: Remove more dead code in add_to_queue
  gfs2: Replace GIF_DEFER_DELETE with GLF_DEFER_DELETE
  gfs2: glock holder GL_NOPID fix
  ...
2025-03-27 12:09:25 -07:00
Linus Torvalds 26d8e43079 vfs-6.15-rc1.async.dir
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rNwAKCRCRxhvAZXjc
 onBJAP9Z8Ywmlb5KQ1E3HvDmkwyY6yOSyZ9/CmbzrkCJ8ywYkQD/d9/xt0EP/O/q
 N8YtzXArHWt7u0YbcVpy9WK3F72BdwU=
 =VJgY
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs async dir updates from Christian Brauner:
 "This contains cleanups that fell out of the work from async directory
  handling:

   - Change kern_path_locked() and user_path_locked_at() to never return
     a negative dentry. This simplifies the usability of these helpers
     in various places

   - Drop d_exact_alias() from the remaining place in NFS where it is
     still used. This also allows us to drop the d_exact_alias() helper
     completely

   - Drop an unnecessary call to fh_update() from nfsd_create_locked()

   - Change i_op->mkdir() to return a struct dentry

     Change vfs_mkdir() to return a dentry provided by the filesystems
     which is hashed and positive. This allows us to reduce the number
     of cases where the resulting dentry is not positive to very few
     cases. The code in these places becomes simpler and easier to
     understand.

   - Repack DENTRY_* and LOOKUP_* flags"

* tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  doc: fix inline emphasis warning
  VFS: Change vfs_mkdir() to return the dentry.
  nfs: change mkdir inode_operation to return alternate dentry if needed.
  fuse: return correct dentry for ->mkdir
  ceph: return the correct dentry on mkdir
  hostfs: store inode in dentry after mkdir if possible.
  Change inode_operations.mkdir to return struct dentry *
  nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
  nfs/vfs: discard d_exact_alias()
  VFS: add common error checks to lookup_one_qstr_excl()
  VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  VFS: repack LOOKUP_ bit flags.
  VFS: repack DENTRY_ flags.
2025-03-24 10:47:14 -07:00
Linus Torvalds 0ec0d4ecdd vfs-6.15-rc1.iomap
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rGwAKCRCRxhvAZXjc
 okVnAP9VgaYjWGzaeep/dLzWtu7C/Cg5Swl1P84Vj+SJ+hFPEAD/auzWTV0D0Ko5
 5GLyUsLZehfeVDOSRqmiyt1po8iVsQo=
 =ANks
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs iomap updates from Christian Brauner:

 - Allow the filesystem to submit the writeback bios.

    - Allow the filsystem to track completions on a per-bio bases
      instead of the entire I/O.

    - Change writeback_ops so that ->submit_bio can be done by the
      filesystem.

    - A new ANON_WRITE flag for writes that don't have a block number
      assigned to them at the iomap level leaving the filesystem to do
      that work in the submission handler.

 - Incremental iterator advance

   The folio_batch support for zero range where the filesystem provides
   a batch of folios to process that might not be logically continguous
   requires more flexibility than the current offset based iteration
   currently offers.

   Update all iomap operations to advance the iterator within the
   operation and thus remove the need to advance from the core iomap
   iterator.

 - Make buffered writes work with RWF_DONTCACHE

   If RWF_DONTCACHE is set for a write, mark the folios being written as
   uncached. On writeback completion the pages will be dropped.

 - Introduce infrastructure for large atomic writes

   This will eventually be used by xfs and ext4.

* tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits)
  iomap: rework IOMAP atomic flags
  iomap: comment on atomic write checks in iomap_dio_bio_iter()
  iomap: inline iomap_dio_bio_opflags()
  iomap: fix inline data on buffered read
  iomap: Lift blocksize restriction on atomic writes
  iomap: Support SW-based atomic writes
  iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW
  xfs: flag as supporting FOP_DONTCACHE
  iomap: make buffered writes work with RWF_DONTCACHE
  iomap: introduce a full map advance helper
  iomap: rename iomap_iter processed field to status
  iomap: remove unnecessary advance from iomap_iter()
  dax: advance the iomap_iter on pte and pmd faults
  dax: advance the iomap_iter on dedupe range
  dax: advance the iomap_iter on unshare range
  dax: advance the iomap_iter on zero range
  dax: push advance down into dax_iomap_iter() for read and write
  dax: advance the iomap_iter in the read/write path
  iomap: convert misc simple ops to incremental advance
  iomap: advance the iter on direct I/O
  ...
2025-03-24 10:19:31 -07:00
Andreas Gruenbacher 8cb70b91b2 gfs2: some comment clarifications
Since commit e1fa9ea85c ("gfs2: Stop using glock holder auto-demotion
for now"), we unconditionally drop the inode glock before trying to
fault in more pages.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-18 13:21:39 +01:00
Dan Carpenter 951d701ef1 gfs2: Fix a NULL vs IS_ERR() bug in gfs2_find_jhead()
The filemap_grab_folio() function doesn't return NULL, it returns error
pointers.  Fix the check to match.

Fixes: 4082976009 ("gfs2: Convert gfs2_find_jhead() to use a folio")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-12 13:00:28 +01:00
Matthew Wilcox (Oracle) 0776a508d1 gfs2: Convert gfs2_meta_read_endio() to use a folio
Switch from bio_for_each_segment_all() to bio_for_each_folio_all()
which removes a call to page_buffers().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle) 536da2a440 gfs2: Convert gfs2_end_log_write_bh() to work on a folio
gfs2_end_log_write() has to handle bios which consist of both pages
which belong to folios and pages which were allocated from a mempool and
do not belong to a folio.  It would be cleaner to have separate endio
handlers which handle each type, but it's not clear to me whether that's
even possible.

This patch is slightly forward-looking in that page_folio() cannot
currently return NULL, but it will return NULL in the future for pages
which do not belong to a folio.

This was the last user of page_has_buffers(), so remove it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle) 4082976009 gfs2: Convert gfs2_find_jhead() to use a folio
Remove a call to grab_cache_page() by using a folio throughout
this function.

[agruenba@redhat.com: Adjust to return value difference between
bio_add_page() and bio_add_folio().]

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle) e00307e8d4 gfs2: Convert gfs2_jhead_pg_srch() to gfs2_jhead_folio_search()
Pass in the folio instead of the page.  Add an assert that this is
not a large folio as we'd need a more complex solution if we wanted to
kmap() each page out of a large folio.  Removes a use of folio->page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[agruenba@redhat.com: Rename gfs2_jhead_folio_srch() to gfs2_jhead_folio_search().]
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle) e6ff5f2089 gfs2: Use b_folio in gfs2_check_magic()
We are preparing to remove bh->b_page.  Use kmap_local_folio() instead
of kmap_local_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle) 072d732c05 gfs2: Use b_folio in gfs2_submit_bhs()
Remove a reference to bh->b_page which is going to be removed soon.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle) 3f2fc848be gfs2: Use b_folio in gfs2_trans_add_meta()
The lock bit is maintained on the folio, not on the page.  Saves two
calls to compound_head() as well as removing two references to
bh->b_page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:39 +01:00
Matthew Wilcox (Oracle) 6576742b90 gfs2: Use b_folio in gfs2_log_write_bh()
We are preparing to remove bh->b_page.  gfs2_log_write() should continue
to operate on pages as some of the memory being logged does not come
from folios, so convert from folio to page in this function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:39 +01:00
Andreas Gruenbacher 41a8e04c94 gfs2: skip if we cannot defer delete
In gfs2_evict_inode(), in the unlikely case that we cannot defer
deleting the inode, it is not safe to fall back to deleting the inode;
the only valid choice we have is to skip the delete.

In addition, in evict_should_delete(), if we cannot lock the inode glock
exclusively, we are in a bad enough state that skipping the delete is
likely a better choice than trying to recover from the failure later.

Fixes: c5b7a2400e ("gfs2: Only defer deletes when we have an iopen glock")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:38 +01:00
Andreas Gruenbacher 79fe790a32 gfs2: remove redundant warnings
In glock_set_object() and glock_clear_object(), there is no need to
print the glock type and number when we dump the entire glock, anyway.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:38 +01:00
Andreas Gruenbacher e9e38ed725 gfs2: minor evict fix
In evict_should_delete(), when gfs2_upgrade_iopen_glock() fails, we
detach the iopen glock from the inode without calling
glock_clear_object().  This leads to a warning in glock_set_object()
when the same inode is recreated and the glock is reused.
Fix that by only detaching the iopen glock in gfs2_evict_inode().

In addition, remove the dequeue code from evict_should_delete(); we
already perform a conditional dequeue in gfs2_evict_inode().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:38 +01:00
Andreas Gruenbacher 9136cad723 gfs2: Prevent inode creation race (2)
In gfs2_try_evict(), we try grabbing the inode to evict, we try to evict
it, and then we try grabbing it again to see if it still exists.  There
is no guarantee that we will end up with the same inode both times; the
inode validity check that commit ffd1cf0443 ("gfs2: Prevent inode
creation race") added to the first grab is actually needed both times.

(To avoid code duplication, add a grab_existing_inode() helper.)

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:38 +01:00
Andreas Gruenbacher 6cb3b1c2df gfs2: Fix additional unlikely request cancelation race
In gfs2_glock_dq(), we must drop the glock spin lock before calling
->lm_cancel, but this means that in the meantime, the operation we are
trying to cancel could complete.  If the operation completes
unsuccessfully, another holder can end up at the head of the queue and
another ->lm_lock operation can get started.  In this case, we would end
up canceling that second operation by accident.

To prevent that, introduce a new GLF_CANCELING flag.  Set that flag in
gfs2_glock_dq() when trying to cancel an operation.  When seeing that
flag, finish_xmote() will then keep the GLF_LOCK flag set to prevent
other glock operations from taking place.  gfs2_glock_dq() then
completes the cancelation attempt by clearing GLF_LOCK and
GLF_CANCELING.

In addition, add a missing GLF_DEMOTE_IN_PROGRESS check in
gfs2_glock_dq() to make sure that we won't accidentally cancel a demote
request.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:38 +01:00
Andreas Gruenbacher a431d49243 gfs2: Fix request cancelation bug
In finish_xmote(), when a locking request is canceled, the corresponding
holder is moved to the tail of the holders list instead of being
dequeued immediately.  When there is only a single holder, the canceled
locking request is then immediately repeated.  This makes no sense; it
looks like another remnant of LM_FLAG_PRIORITY support.

Instead, dequeue canceled holders and proceed with the next holder in
finish_xmote().  We can then easily detect in gfs2_glock_dq() when a
holder has been canceled.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10 18:15:38 +01:00