Commit Graph

4532 Commits (f9db1fc56281b96fe8748632b3894de970a8a850)

Author SHA1 Message Date
Chao Yu ba8dac350f f2fs: fix to zero post-eof page
fstest reports a f2fs bug:

generic/363 42s ... [failed, exit status 1]- output mismatch (see /share/git/fstests/results//generic/363.out.bad)
    --- tests/generic/363.out   2025-01-12 21:57:40.271440542 +0800
    +++ /share/git/fstests/results//generic/363.out.bad 2025-05-19 19:55:58.000000000 +0800
    @@ -1,2 +1,78 @@
     QA output created by 363
     fsx -q -S 0 -e 1 -N 100000
    +READ BAD DATA: offset = 0xd6fb, size = 0xf044, fname = /mnt/f2fs/junk
    +OFFSET      GOOD    BAD     RANGE
    +0x1540d     0x0000  0x2a25  0x0
    +operation# (mod 256) for the bad data may be 37
    +0x1540e     0x0000  0x2527  0x1
    ...
    (Run 'diff -u /share/git/fstests/tests/generic/363.out /share/git/fstests/results//generic/363.out.bad'  to see the entire diff)
Ran: generic/363
Failures: generic/363
Failed 1 of 1 tests

The root cause is user can update post-eof page via mmap [1], however, f2fs
missed to zero post-eof page in below operations, so, once it expands i_size,
then it will include dummy data locates previous post-eof page, so during
below operations, we need to zero post-eof page.

Operations which can include dummy data after previous i_size after expanding
i_size:
- write
- mapwrite [1]
- truncate
- fallocate
 * preallocate
 * zero_range
 * insert_range
 * collapse_range
- clone_range (doesn’t support in f2fs)
- copy_range (doesn’t support in f2fs)

[1] https://man7.org/linux/man-pages/man2/mmap.2.html 'BUG section'

Cc: stable@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-06-18 21:35:29 +00:00
Matthew Wilcox (Oracle) 6dea74e454 f2fs: Fix __write_node_folio() conversion
This conversion moved the folio_unlock() to inside __write_node_folio(),
but missed one caller so we had a double-unlock on this path.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Chao Yu <chao@kernel.org>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Reported-by: syzbot+c0dc46208750f063d0e0@syzkaller.appspotmail.com
Fixes: 80f31d2a7e (f2fs: return bool from __write_node_folio)
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-06-10 14:52:21 +00:00
Linus Torvalds d8441523f2 f2fs-for-6.16-rc1
In this round, Matthew converted most of page operations to using folio. Beyond
 the work, we've applied some performance tunings such as GC and linear lookup,
 in addition to enhancing fault injection and sanity checks.
 
 Enhancement:
  - large number of folio conversions
  - add a control to turn on/off the linear lookup for performance
  - tune GC logics for zoned block device
  - improve fault injection and sanity checks
 
 Bug fix:
  - handle error cases of memory donation
  - fix to correct check conditions in f2fs_cross_rename
  - fix to skip f2fs_balance_fs() if checkpoint is disabled
  - don't over-report free space or inodes in statvfs
  - prevent the current section from being selected as a victim during GC
  - fix to calculate first_zoned_segno correctly
  - fix to avoid inconsistence in between SIT and SSA for zoned block device
 
 As usual, there are several debugging patches and clean-ups as well.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmg3PdcACgkQQBSofoJI
 UNL/mQ/9Hkru4XSCokhxt8+/HoFRnTliAlzfD45Vzkkhz1YP7J8VdvWOzJV/WEai
 D3Ib50Q6/y2ptxu7cwOpmToR3fI3RzAlgQsYooFAiZOBnyUkBOLA1oaVuT4s/EYg
 u85xxLx0SW/IMX5CKKbYzhbXnocGAvRUkp/k30kjKJxpCeQ7pw/mLhw/2XeNIb9h
 FxJbECWPpf4PA6ot22YUNvQn0plF/s9873PPhv50vpGyXTHIlTbDCSMeEC1r1E5v
 xWsPcWmTkyPIyBhNFEONWJw1l3wcVIVKNBfBqwMEDr+Tgqi5UDEREeTDV9q5C6y+
 vw3KnsOqX7RTdLExGfefTOnBsTqqMwSZQSH2HL5/Poayg5obXf3D/fUqAQajJpt/
 FbAtfKaXElJcC7l3DJQU3Trh+WpdEPbuMiJo43OzX0YGvMfkA/sYrAHTYm5Q4nsC
 wrRLaWiBgG6nQDKNXz+amD9kL1SMxp+Vsf6ybtChH3gvMqDAJsR7DY1F/Cxe3ry8
 8JoJiGRYq70lw5xNACfJNQwWwRbtySy63nIwMA7FGR9zaXBQJx+cSPhEeLsS+0hI
 zgijgtgRjbfuojlh7qvfFArHEIL4A67Um3RhjHbLWSFhREPaTB0665ElUNTGPe+y
 hVdYtkb0X2ngsYdV/Xdmp/OThpSxI8x1ZCXVsrElawVIMpjP+nA=
 =G8sl
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, Matthew converted most of page operations to using
  folio. Beyond the work, we've applied some performance tunings such as
  GC and linear lookup, in addition to enhancing fault injection and
  sanity checks.

  Enhancements:
   - large number of folio conversions
   - add a control to turn on/off the linear lookup for performance
   - tune GC logics for zoned block device
   - improve fault injection and sanity checks

  Bug fixes:
   - handle error cases of memory donation
   - fix to correct check conditions in f2fs_cross_rename
   - fix to skip f2fs_balance_fs() if checkpoint is disabled
   - don't over-report free space or inodes in statvfs
   - prevent the current section from being selected as a victim during GC
   - fix to calculate first_zoned_segno correctly
   - fix to avoid inconsistence between SIT and SSA for zoned block device

  As usual, there are several debugging patches and clean-ups as well"

* tag 'f2fs-for-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (195 commits)
  f2fs: fix to correct check conditions in f2fs_cross_rename
  f2fs: use d_inode(dentry) cleanup dentry->d_inode
  f2fs: fix to skip f2fs_balance_fs() if checkpoint is disabled
  f2fs: clean up to check bi_status w/ BLK_STS_OK
  f2fs: introduce is_{meta,node}_folio
  f2fs: add ckpt_valid_blocks to the section entry
  f2fs: add a method for calculating the remaining blocks in the current segment in LFS mode.
  f2fs: introduce FAULT_VMALLOC
  f2fs: use vmalloc instead of kvmalloc in .init_{,de}compress_ctx
  f2fs: add f2fs_bug_on() in f2fs_quota_read()
  f2fs: add f2fs_bug_on() to detect potential bug
  f2fs: remove unused sbi argument from checksum functions
  f2fs: fix 32-bits hexademical number in fault injection doc
  f2fs: don't over-report free space or inodes in statvfs
  f2fs: return bool from __write_node_folio
  f2fs: simplify return value handling in f2fs_fsync_node_pages
  f2fs: always unlock the page in f2fs_write_single_data_page
  f2fs: remove wbc->for_reclaim handling
  f2fs: return bool from __f2fs_write_meta_folio
  f2fs: fix to return correct error number in f2fs_sync_node_pages()
  ...
2025-05-30 08:40:25 -07:00
Zhiguo Niu 9883494c45 f2fs: fix to correct check conditions in f2fs_cross_rename
Should be "old_dir" here.

Fixes: 5c57132eaf ("f2fs: support project quota")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28 16:05:25 +00:00
Zhiguo Niu a6c397a31f f2fs: use d_inode(dentry) cleanup dentry->d_inode
no logic changes.

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28 16:05:22 +00:00
Chao Yu c836d3b8d9 f2fs: fix to skip f2fs_balance_fs() if checkpoint is disabled
Syzbot reports a f2fs bug as below:

INFO: task syz-executor328:5856 blocked for more than 144 seconds.
      Not tainted 6.15.0-rc6-syzkaller-00208-g3c21441eeffc #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor328 state:D stack:24392 pid:5856  tgid:5832  ppid:5826   task_flags:0x400040 flags:0x00004006
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5382 [inline]
 __schedule+0x168f/0x4c70 kernel/sched/core.c:6767
 __schedule_loop kernel/sched/core.c:6845 [inline]
 schedule+0x165/0x360 kernel/sched/core.c:6860
 io_schedule+0x81/0xe0 kernel/sched/core.c:7742
 f2fs_balance_fs+0x4b4/0x780 fs/f2fs/segment.c:444
 f2fs_map_blocks+0x3af1/0x43b0 fs/f2fs/data.c:1791
 f2fs_expand_inode_data+0x653/0xaf0 fs/f2fs/file.c:1872
 f2fs_fallocate+0x4f5/0x990 fs/f2fs/file.c:1975
 vfs_fallocate+0x6a0/0x830 fs/open.c:338
 ioctl_preallocate fs/ioctl.c:290 [inline]
 file_ioctl fs/ioctl.c:-1 [inline]
 do_vfs_ioctl+0x1b8f/0x1eb0 fs/ioctl.c:885
 __do_sys_ioctl fs/ioctl.c:904 [inline]
 __se_sys_ioctl+0x82/0x170 fs/ioctl.c:892
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The root cause is after commit 84b5bb8bf0 ("f2fs: modify
f2fs_is_checkpoint_ready logic to allow more data to be written with the
CP disable"), we will get chance to allow f2fs_is_checkpoint_ready() to
return true once below conditions are all true:
1. checkpoint is disabled
2. there are not enough free segments
3. there are enough free blocks

Then it will cause f2fs_balance_fs() to trigger foreground GC.

void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
...
	if (!f2fs_is_checkpoint_ready(sbi))
		return;

And the testcase mounts f2fs image w/ gc_merge,checkpoint=disable, so deadloop
will happen through below race condition:

- f2fs_do_shutdown		- vfs_fallocate				- gc_thread_func
				 - file_start_write
				  - __sb_start_write(SB_FREEZE_WRITE)
				 - f2fs_fallocate
				  - f2fs_expand_inode_data
				   - f2fs_map_blocks
				    - f2fs_balance_fs
				     - prepare_to_wait
				     - wake_up(gc_wait_queue_head)
				     - io_schedule
 - bdev_freeze
  - freeze_super
   - sb->s_writers.frozen = SB_FREEZE_WRITE;
   - sb_wait_write(sb, SB_FREEZE_WRITE);
									 - if (sbi->sb->s_writers.frozen >= SB_FREEZE_WRITE) continue;
									 : cause deadloop

This patch fix to add check condition in f2fs_balance_fs(), so that if
checkpoint is disabled, we will just skip trigger foreground GC to
avoid such deadloop issue.

Meanwhile let's remove f2fs_is_checkpoint_ready() check condition in
f2fs_balance_fs(), since it's redundant, due to the main logic in the
function is to check:
a) whether checkpoint is disabled
b) there is enough free segments

f2fs_balance_fs() still has all logics after f2fs_is_checkpoint_ready()'s
removal.

Reported-by: syzbot+aa5bb5f6860e08a60450@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/682d743a.a00a0220.29bc26.0289.GAE@google.com
Fixes: 84b5bb8bf0 ("f2fs: modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable")
Cc: Qi Han <hanqi@vivo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28 16:04:15 +00:00
Chao Yu 68e7f31eec f2fs: clean up to check bi_status w/ BLK_STS_OK
Check bi_status w/ BLK_STS_OK instead of 0 for cleanup.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28 16:03:39 +00:00
Chao Yu 019a891242 f2fs: introduce is_{meta,node}_folio
Just cleanup, no changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28 16:03:26 +00:00
yohan.joung deecd282bc f2fs: add ckpt_valid_blocks to the section entry
when performing buffered writes in a large section,
overhead is incurred due to the iteration through
ckpt_valid_blocks within the section.
when SEGS_PER_SEC is 128, this overhead accounts for 20% within
the f2fs_write_single_data_page routine.
as the size of the section increases, the overhead also grows.
to handle this problem ckpt_valid_blocks is
added within the section entries.

Test
insmod null_blk.ko nr_devices=1 completion_nsec=1  submit_queues=8
hw_queue_depth=64 max_sectors=512 bs=4096 memory_backed=1
make_f2fs /dev/block/nullb0
make_f2fs -s 128 /dev/block/nullb0
fio --bs=512k --size=1536M --rw=write --name=1
--filename=/mnt/test_dir/seq_write
--ioengine=io_uring --iodepth=64 --end_fsync=1

before
SEGS_PER_SEC 1
2556MiB/s
SEGS_PER_SEC 128
2145MiB/s

after
SEGS_PER_SEC 1
2556MiB/s
SEGS_PER_SEC 128
2556MiB/s

Signed-off-by: yohan.joung <yohan.joung@sk.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28 15:58:49 +00:00
yohan.joung 249ad438e1 f2fs: add a method for calculating the remaining blocks in the current segment in LFS mode.
In LFS mode, the previous segment cannot use invalid blocks,
so the remaining blocks from the next_blkoff of the current segment
to the end of the section are calculated.

Signed-off-by: yohan.joung <yohan.joung@sk.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28 15:58:30 +00:00
Chao Yu 54ca9be0bc f2fs: introduce FAULT_VMALLOC
Introduce a new fault type FAULT_VMALLOC to simulate no memory error in
f2fs_vmalloc().

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-27 23:52:36 +00:00
Chao Yu 70dd07c888 f2fs: use vmalloc instead of kvmalloc in .init_{,de}compress_ctx
.init_{,de}compress_ctx uses kvmalloc() to alloc memory, it will try
to allocate physically continuous page first, it may cause more memory
allocation pressure, let's use vmalloc instead to mitigate it.

[Test]
cd /data/local/tmp
touch file
f2fs_io setflags compression file
f2fs_io getflags file
for i in $(seq 1 10); do sync; echo 3 > /proc/sys/vm/drop_caches;\
time f2fs_io write 512 0 4096 zero osync file; truncate -s 0 file;\
done

[Result]
Before		After		Delta
21.243		21.694		-2.12%

For compression, we recommend to use ioctl to compress file data in
background for workaround.

For decompression, only zstd will be affected.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-27 23:52:35 +00:00
Chao Yu 5827e3c720 f2fs: add f2fs_bug_on() in f2fs_quota_read()
mapping_read_folio_gfp() will return a folio, it should always be
uptodate, let's check folio uptodate status to detect any potenial
bug.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-27 23:52:35 +00:00
Chao Yu 9b6fc9888e f2fs: add f2fs_bug_on() to detect potential bug
Add f2fs_bug_on() to check whether memory preallocation will fail or
not after radix_tree_preload(GFP_NOFS | __GFP_NOFAIL).

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-27 23:52:35 +00:00
Eric Biggers d005af3b67 f2fs: remove unused sbi argument from checksum functions
Since __f2fs_crc32() now calls crc32() directly, it no longer uses its
sbi argument.  Remove that, and simplify its callers accordingly.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-27 23:52:35 +00:00
Chao Yu a920196062 f2fs: don't over-report free space or inodes in statvfs
This fixes an analogus bug that was fixed in modern filesystems:
a) xfs in commit 4b8d867ca6 ("xfs: don't over-report free space or
inodes in statvfs")
b) ext4 in commit f87d3af741 ("ext4: don't over-report free space
or inodes in statvfs")
where statfs can report misleading / incorrect information where
project quota is enabled, and the free space is less than the
remaining quota.

This commit will resolve a test failure in generic/762 which tests
for this bug.

generic/762       - output mismatch (see /share/git/fstests/results//generic/762.out.bad)
    --- tests/generic/762.out   2025-04-15 10:21:53.371067071 +0800
    +++ /share/git/fstests/results//generic/762.out.bad 2025-05-13 16:13:37.000000000 +0800
    @@ -6,8 +6,10 @@
     root blocks2 is in range
     dir blocks2 is in range
     root bavail2 is in range
    -dir bavail2 is in range
    +dir bavail2 has value of 1539066
    +dir bavail2 is NOT in range 304734.87 .. 310891.13
     root blocks3 is in range
    ...
    (Run 'diff -u /share/git/fstests/tests/generic/762.out /share/git/fstests/results//generic/762.out.bad'  to see the entire diff)

HINT: You _MAY_ be missing kernel fix:
      XXXXXXXXXXXXXX xfs: don't over-report free space or inodes in statvfs

Cc: stable@kernel.org
Fixes: ddc34e328d ("f2fs: introduce f2fs_statfs_project")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-13 15:32:41 +00:00
Christian Brauner 1afe9e7da8
f2fs: fix freezing filesystem during resize
Using FREEZE_HOLDER_USERSPACE has two consequences:

(1) If userspace freezes the filesystem after mnt_drop_write_file() but
    before freeze_super() was called filesystem resizing will fail
    because the freeze isn't marked as nestable.

(2) If the kernel has successfully frozen the filesystem via
    FREEZE_HOLDER_USERSPACE userspace can simply undo it by using the
    FITHAW ioctl.

Fix both issues by using FREEZE_HOLDER_KERNEL. It will nest with
FREEZE_HOLDER_USERSPACE and cannot be undone by userspace.

And it is the correct thing to do because the kernel temporarily freezes
the filesystem.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09 12:41:24 +02:00
Christian Brauner 1af3331764
super: add filesystem freezing helpers for suspend and hibernate
Allow the power subsystem to support filesystem freeze for
suspend and hibernate.

For some kernel subsystems it is paramount that they are guaranteed that
they are the owner of the freeze to avoid any risk of deadlocks. This is
the case for the power subsystem. Enable it to recognize whether it did
actually freeze the filesystem.

If userspace has 10 filesystems and suspend/hibernate manges to freeze 5
and then fails on the 6th for whatever odd reason (current or future)
then power needs to undo the freeze of the first 5 filesystems. It can't
just walk the list again because while it's unlikely that a new
filesystem got added in the meantime it still cannot tell which
filesystems the power subsystem actually managed to get a freeze
reference count on that needs to be dropped during thaw.

There's various ways out of this ugliness. For example, record the
filesystems the power subsystem managed to freeze on a temporary list in
the callbacks and then walk that list backwards during thaw to undo the
freezing or make sure that the power subsystem just actually exclusively
freezes things it can freeze and marking such filesystems as being owned
by power for the duration of the suspend or resume cycle. I opted for
the latter as that seemed the clean thing to do even if it means more
code changes.

If hibernation races with filesystem freezing (e.g. DM reconfiguration),
then hibernation need not freeze a filesystem because it's already
frozen but userspace may thaw the filesystem before hibernation actually
happens.

If the race happens the other way around, DM reconfiguration may
unexpectedly fail with EBUSY.

So allow FREEZE_EXCL to nest with other holders. An exclusive freezer
cannot be undone by any of the other concurrent freezers.

Link: https://lore.kernel.org/r/20250329-work-freeze-v2-6-a47af37ecc3d@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09 12:41:02 +02:00
Christoph Hellwig 80f31d2a7e f2fs: return bool from __write_node_folio
__write_node_folio can only return 0 or AOP_WRITEPAGE_ACTIVATE.
As part of phasing out AOP_WRITEPAGE_ACTIVATE, switch to a bool return
instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-08 15:23:46 +00:00
Christoph Hellwig 0638f28b30 f2fs: simplify return value handling in f2fs_fsync_node_pages
Always assign ret where the error happens, and jump to out instead
of multiple loop exit conditions to prepare for changes in the
__write_node_folio calling convention.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-08 15:23:31 +00:00
Christoph Hellwig 84c5d16711 f2fs: always unlock the page in f2fs_write_single_data_page
Consolidate the code to unlock the page in f2fs_write_single_data_page
instead of leaving it to the callers for the AOP_WRITEPAGE_ACTIVATE case.
Replace AOP_WRITEPAGE_ACTIVATE with a positive return of 1 as this case
now doesn't match the historic ->writepage special return code that is
on it's way out now that ->writepage has been removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-08 15:23:18 +00:00
Christoph Hellwig 402dd9f02c f2fs: remove wbc->for_reclaim handling
Since commits 7ff0104a80 ("f2fs: Remove f2fs_write_node_page()") and
3b47398d98 ("f2fs: Remove f2fs_write_meta_page()'), f2fs can't be
called from reclaim context any more.  Remove all code keyed of the
wbc->for_reclaim flag, which is now only set for writing out swap or
shmem pages inside the swap code, but never passed to file systems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-08 15:22:45 +00:00
Christoph Hellwig 39122e4544 f2fs: return bool from __f2fs_write_meta_folio
__f2fs_write_meta_folio can only return 0 or AOP_WRITEPAGE_ACTIVATE.
As part of phasing out AOP_WRITEPAGE_ACTIVATE, switch to a bool return
instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-08 15:22:28 +00:00
Chao Yu 43ba56a043 f2fs: fix to return correct error number in f2fs_sync_node_pages()
If __write_node_folio() failed, it will return AOP_WRITEPAGE_ACTIVATE,
the incorrect return value may be passed to userspace in below path,
fix it.

- sync_filesystem
 - sync_fs
  - f2fs_issue_checkpoint
   - block_operations
    - f2fs_sync_node_pages
     - __write_node_folio
     : return AOP_WRITEPAGE_ACTIVATE

Cc: stable@vger.kernel.org
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-08 15:21:58 +00:00
Kairui Song 0427e811c9 f2fs: drop usage of folio_index
folio_index is only needed for mixed usage of page cache and swap
cache, for pure page cache usage, the caller can just use
folio->index instead.

It can't be a swap cache folio here.  Swap mapping may only call into fs
through `swap_rw` but f2fs does not use that method for swap.

Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org> (maintainer:F2FS FILE SYSTEM)
Cc: Chao Yu <chao@kernel.org> (maintainer:F2FS FILE SYSTEM)
Cc: linux-f2fs-devel@lists.sourceforge.net (open list:F2FS FILE SYSTEM)
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06 15:46:55 +00:00
Chao Yu 0244c77fed f2fs: support FAULT_TIMEOUT
Support to inject a timeout fault into function, currently it only
support to inject timeout to commit_atomic_write flow to reproduce
inconsistent bug, like the bug fixed by commit f098aeba04 ("f2fs:
fix to avoid atomicity corruption of atomic file").

By default, the new type fault will inject 1000ms timeout, and the
timeout process can be interrupted by SIGKILL.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06 15:46:55 +00:00
Daeho Jeong cf7cd17c97 f2fs: handle error cases of memory donation
In cases of removing memory donation, we need to handle some error cases
like ENOENT and EACCES (indicating the range already has been donated).

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06 15:46:55 +00:00
Chao Yu bb5eb8a5b2 f2fs: fix to bail out in get_new_segment()
------------[ cut here ]------------
WARNING: CPU: 3 PID: 579 at fs/f2fs/segment.c:2832 new_curseg+0x5e8/0x6dc
pc : new_curseg+0x5e8/0x6dc
Call trace:
 new_curseg+0x5e8/0x6dc
 f2fs_allocate_data_block+0xa54/0xe28
 do_write_page+0x6c/0x194
 f2fs_do_write_node_page+0x38/0x78
 __write_node_page+0x248/0x6d4
 f2fs_sync_node_pages+0x524/0x72c
 f2fs_write_checkpoint+0x4bc/0x9b0
 __checkpoint_and_complete_reqs+0x80/0x244
 issue_checkpoint_thread+0x8c/0xec
 kthread+0x114/0x1bc
 ret_from_fork+0x10/0x20

get_new_segment() detects inconsistent status in between free_segmap
and free_secmap, let's record such error into super block, and bail
out get_new_segment() instead of continue using the segment.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06 15:46:54 +00:00
Chao Yu 617e0491ab f2fs: sysfs: export linear_lookup in features directory
cat /sys/fs/f2fs/features/linear_lookup
supported

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06 15:46:54 +00:00
Chao Yu 3fea0641b0 f2fs: sysfs: add encoding_flags entry
This patch adds a new sysfs entry /sys/fs/f2fs/<disk>/encoding_flags,
it is a read-only entry to show the value of sb.s_encoding_flags, the
value is hexadecimal.

============================     ==========
Flag_Name                        Flag_Value
============================     ==========
SB_ENC_STRICT_MODE_FL            0x00000001
SB_ENC_NO_COMPAT_FALLBACK_FL     0x00000002
============================     ==========

case#1
mkfs.f2fs -f -O casefold -C utf8:strict /dev/vda
mount /dev/vda /mnt/f2fs
cat /sys/fs/f2fs/vda/encoding_flags
1

case#2
mkfs.f2fs -f -O casefold -C utf8 /dev/vda
fsck.f2fs --nolinear-lookup=1 /dev/vda
mount /dev/vda /mnt/f2fs
cat /sys/fs/f2fs/vda/encoding_flags
2

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06 15:46:52 +00:00
Chao Yu dc6d9ef57f f2fs: zone: fix to calculate first_zoned_segno correctly
A zoned device can has both conventional zones and sequential zones,
so we should not treat first segment of zoned device as first_zoned_segno,
instead, we need to check zone type for each zone during traversing zoned
device to find first_zoned_segno.

Otherwise, for below case, first_zoned_segno will be 0, which could be
wrong.

create_null_blk 512 2 1024 1024
mkfs.f2fs -m /dev/nullb0

Testcase:

export SCRIPTS_PATH=/share/git/scripts

test multiple devices w/ zoned device
for ((i=0;i<8;i++)) do {
	zonesize=$((2<<$i))
	conzone=$((4096/$zonesize))
	seqzone=$((4096/$zonesize))
	$SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone
	mkfs.f2fs -f -m /dev/vdb -c /dev/nullb0
	mount /dev/vdb /mnt/f2fs
	touch /mnt/f2fs/file
	f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2))
	stat /mnt/f2fs/file
	df
	cat /proc/fs/f2fs/vdb/segment_info
	umount /mnt/f2fs
	$SCRIPTS_PATH/nullblk_remove.sh 0
} done

test single zoned device
for ((i=0;i<8;i++)) do {
	zonesize=$((2<<$i))
	conzone=$((4096/$zonesize))
	seqzone=$((4096/$zonesize))
	$SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone
	mkfs.f2fs -f -m /dev/nullb0
	mount /dev/nullb0 /mnt/f2fs
	touch /mnt/f2fs/file
	f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2))
	stat /mnt/f2fs/file
	df
	cat /proc/fs/f2fs/nullb0/segment_info
	umount /mnt/f2fs
	$SCRIPTS_PATH/nullblk_remove.sh 0
} done

Fixes: 9703d69d9d ("f2fs: support file pinning for zoned devices")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:48 +00:00
Chao Yu 5db0d252c6 f2fs: fix to do sanity check on sit_bitmap_size
w/ below testcase, resize will generate a corrupted image which
contains inconsistent metadata, so when mounting such image, it
will trigger kernel panic:

touch img
truncate -s $((512*1024*1024*1024)) img
mkfs.f2fs -f img $((256*1024*1024))
resize.f2fs -s -i img -t $((1024*1024*1024))
mount img /mnt/f2fs

------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.h:863!
Oops: invalid opcode: 0000 [#1] SMP PTI
CPU: 11 UID: 0 PID: 3922 Comm: mount Not tainted 6.15.0-rc1+ #191 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:f2fs_ra_meta_pages+0x47c/0x490

Call Trace:
 f2fs_build_segment_manager+0x11c3/0x2600
 f2fs_fill_super+0xe97/0x2840
 mount_bdev+0xf4/0x140
 legacy_get_tree+0x2b/0x50
 vfs_get_tree+0x29/0xd0
 path_mount+0x487/0xaf0
 __x64_sys_mount+0x116/0x150
 do_syscall_64+0x82/0x190
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fdbfde1bcfe

The reaseon is:

sit_i->bitmap_size is 192, so size of sit bitmap is 192*8=1536, at maximum
there are 1536 sit blocks, however MAIN_SEGS is 261893, so that sit_blk_cnt
is 4762, build_sit_entries() -> current_sit_addr() tries to access
out-of-boundary in sit_bitmap at offset from [1536, 4762), once sit_bitmap
and sit_bitmap_mirror is not the same, it will trigger f2fs_bug_on().

Let's add sanity check in f2fs_sanity_check_ckpt() to avoid panic.

Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:48 +00:00
Chao Yu aa1be8dd64 f2fs: fix to detect gcing page in f2fs_is_cp_guaranteed()
Jan Prusakowski reported a f2fs bug as below:

f2fs/007 will hang kernel during testing w/ below configs:

kernel 6.12.18 (from pixel-kernel/android16-6.12)
export MKFS_OPTIONS="-O encrypt -O extra_attr -O project_quota -O quota"
export F2FS_MOUNT_OPTIONS="test_dummy_encryption,discard,fsync_mode=nobarrier,reserve_root=32768,checkpoint_merge,atgc"

cat /proc/<umount_proc_id>/stack
f2fs_wait_on_all_pages+0xa3/0x130
do_checkpoint+0x40c/0x5d0
f2fs_write_checkpoint+0x258/0x550
kill_f2fs_super+0x14f/0x190
deactivate_locked_super+0x30/0xb0
cleanup_mnt+0xba/0x150
task_work_run+0x59/0xa0
syscall_exit_to_user_mode+0x12d/0x130
do_syscall_64+0x57/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e

cat /sys/kernel/debug/f2fs/status

  - IO_W (CP: -256, Data:  256, Flush: (   0    0    1), Discard: (   0    0)) cmd:    0 undiscard:   0

CP IOs reference count becomes negative.

The root cause is:

After 4961acdd65 ("f2fs: fix to tag gcing flag on page during block
migration"), we will tag page w/ gcing flag for raw page of cluster
during its migration.

However, if the inode is both encrypted and compressed, during
ioc_decompress(), it will tag page w/ gcing flag, and it increase
F2FS_WB_DATA reference count:
- f2fs_write_multi_page
 - f2fs_write_raw_page
  - f2fs_write_single_page
   - do_write_page
    - f2fs_submit_page_write
     - WB_DATA_TYPE(bio_page, fio->compressed_page)
     : bio_page is encrypted, so mapping is NULL, and fio->compressed_page
       is NULL, it returns F2FS_WB_DATA
     - inc_page_count(.., F2FS_WB_DATA)

Then, during end_io(), it decrease F2FS_WB_CP_DATA reference count:
- f2fs_write_end_io
 - f2fs_compress_write_end_io
  - fscrypt_pagecache_folio
  : get raw page from encrypted page
  - WB_DATA_TYPE(&folio->page, false)
  : raw page has gcing flag, it returns F2FS_WB_CP_DATA
  - dec_page_count(.., F2FS_WB_CP_DATA)

In order to fix this issue, we need to detect gcing flag in raw page
in f2fs_is_cp_guaranteed().

Fixes: 4961acdd65 ("f2fs: fix to tag gcing flag on page during block migration")
Reported-by: Jan Prusakowski <jprusakowski@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:48 +00:00
Chao Yu 0c708e35cf f2fs: clean up w/ fscrypt_is_bounce_page()
Just cleanup, no logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:48 +00:00
Matthew Wilcox (Oracle) f16ebe0de7 f2fs: Convert clear_node_page_dirty() to clear_node_folio_dirty()
Both callers have a folio so pass it in, removing five calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:48 +00:00
Matthew Wilcox (Oracle) a4d0770271 f2fs: Use a folio in flush_inline_data()
Get a folio from the page cache and use it throughout.  Removes
six calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle) 6b1ad39545 f2fs: Remove f2fs_new_node_page()
All callers now use f2fs_new_node_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle) 963da02bc1 f2fs: Convert fsync_node_entry->page to folio
Convert all callers to set/get a folio instead of a page.  Removes
five calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle) 7d28f13c58 f2fs: Pass a folio to get_dnode_addr()
All callers except __get_inode_rdev() and __set_inode_rdev() now have a
folio, but the only callers of those two functions do have a folio, so
pass the folio to them and then into get_dnode_addr().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle) 6f7ec66180 f2fs: Convert dnode_of_data->node_page to node_folio
All assignments to this struct member are conversions from a folio
so convert it to be a folio and convert all users.  At the same time,
convert data_blkaddr() to take a folio as all callers now have a folio.
Remove eight calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle) 66bca01bc5 f2fs: Pass a folio to set_nid()
All callers have a folio, so pass it in.  Removes two calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle) f92379289f f2fs: Pass a folio to f2fs_update_inode()
All callers now have a folio, so pass it in.  Remove two calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle) a6d26d5c75 f2fs: Return a folio from f2fs_init_inode_metadata()
Convert all three callers to handle a folio return.  Remove three
calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:46 +00:00
Matthew Wilcox (Oracle) 398c7df7bc f2fs: Pass a folio to f2fs_init_read_extent_tree()
The only caller alredy has a folio so pass it in.  Remove two calls
to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:46 +00:00
Matthew Wilcox (Oracle) 97e1b86169 f2fs: Use a folio in f2fs_wait_on_block_writeback()
Fetch a folio from the pagecache and use it.  Removes two calls
to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:46 +00:00
Matthew Wilcox (Oracle) 5951fee46b f2fs: Use a folio in redirty_blocks()
Support large folios & simplify the loops in redirty_blocks().
Use the folio APIs and remove four calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:46 +00:00
Matthew Wilcox (Oracle) b02a903218 f2fs: Use a folio in f2fs_encrypt_one_page()
Fetch a folio from the page cache instead of a page.  Removes two calls
to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:46 +00:00
Matthew Wilcox (Oracle) 842974808a f2fs: Convert f2fs_load_compressed_page() to f2fs_load_compressed_folio()
The only caller already has a folio, so pass it in.  Copy the entire
size of the folio to support large block sizes.  Remove two calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:46 +00:00
Matthew Wilcox (Oracle) 75de20f41f f2fs: Use a folio in prepare_compress_overwrite()
Add f2fs_filemap_get_folio() as a wrapper around __filemap_get_folio()
which can inject an error.  Removes seven calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:46 +00:00
Matthew Wilcox (Oracle) 3d56058c55 f2fs: Use a folio in f2fs_cache_compressed_page()
Look up a folio instead of a page, and if that fails, allocate a folio.
Removes five calls to compound_head(), one of the last few references to
add_to_page_cache_lru() and honours the cpuset_do_page_mem_spread()
setting.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:45 +00:00