Commit Graph

880 Commits (09cfd3c52ea76f43b3cb15e570aeddf633d65e80)

Author SHA1 Message Date
Linus Torvalds 56e7b31071 vfs-6.18-rc1.inode
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQQgAKCRCRxhvAZXjc
 oud9AQD5IG4sNnzCjsvcTDpQkbX5eZW+LFIiAiiN+nztZ+OcRQEAvC2N7YovfqM3
 TWpVoNDKvEPdtDc9ttFMUKqBZYvxvgE=
 =sEaL
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs inode updates from Christian Brauner:
 "This contains a series I originally wrote and that Eric brought over
  the finish line. It moves out the i_crypt_info and i_verity_info
  pointers out of 'struct inode' and into the fs-specific part of the
  inode.

  So now the few filesytems that actually make use of this pay the price
  in their own private inode storage instead of forcing it upon every
  user of struct inode.

  The pointer for the crypt and verity info is simply found by storing
  an offset to its address in struct fsverity_operations and struct
  fscrypt_operations. This shrinks struct inode by 16 bytes.

  I hope to move a lot more out of it in the future so that struct inode
  becomes really just about very core stuff that we need, much like
  struct dentry and struct file, instead of the dumping ground it has
  become over the years.

  On top of this are a various changes associated with the ongoing inode
  lifetime handling rework that multiple people are pushing forward:

   - Stop accessing inode->i_count directly in f2fs and gfs2. They
     simply should use the __iget() and iput() helpers

   - Make the i_state flags an enum

   - Rework the iput() logic

     Currently, if we are the last iput, and we have the I_DIRTY_TIME
     bit set, we will grab a reference on the inode again and then mark
     it dirty and then redo the put. This is to make sure we delay the
     time update for as long as possible

     We can rework this logic to simply dec i_count if it is not 1, and
     if it is do the time update while still holding the i_count
     reference

     Then we can replace the atomic_dec_and_lock with locking the
     ->i_lock and doing atomic_dec_and_test, since we did the
     atomic_add_unless above

   - Add an icount_read() helper and convert everyone that accesses
     inode->i_count directly for this purpose to use the helper

   - Expand dump_inode() to dump more information about an inode helping
     in debugging

   - Add some might_sleep() annotations to iput() and associated
     helpers"

* tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: add might_sleep() annotation to iput() and more
  fs: expand dump_inode()
  inode: fix whitespace issues
  fs: add an icount_read helper
  fs: rework iput logic
  fs: make the i_state flags an enum
  fs: stop accessing ->i_count directly in f2fs and gfs2
  fsverity: check IS_VERITY() in fsverity_cleanup_inode()
  fs: remove inode::i_verity_info
  btrfs: move verity info pointer to fs-specific part of inode
  f2fs: move verity info pointer to fs-specific part of inode
  ext4: move verity info pointer to fs-specific part of inode
  fsverity: add support for info in fs-specific part of inode
  fs: remove inode::i_crypt_info
  ceph: move crypt info pointer to fs-specific part of inode
  ubifs: move crypt info pointer to fs-specific part of inode
  f2fs: move crypt info pointer to fs-specific part of inode
  ext4: move crypt info pointer to fs-specific part of inode
  fscrypt: add support for info in fs-specific part of inode
  fscrypt: replace raw loads of info pointer with helper function
2025-09-29 09:42:30 -07:00
Mateusz Guzik f99b391778
fs: rename generic_delete_inode() and generic_drop_inode()
generic_delete_inode() is rather misleading for what the routine is
doing. inode_just_drop() should be much clearer.

The new naming is inconsistent with generic_drop_inode(), so rename that
one as well with inode_ as the suffix.

No functional changes.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 16:09:42 +02:00
Josef Bacik bc986b1d75
fs: stop accessing ->i_count directly in f2fs and gfs2
Instead of accessing ->i_count directly in these file systems, use the
appropriate __iget and iput helpers.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/b8e6eb8a3e690ce082828d3580415bf70dfa93aa.1755806649.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-27 13:12:48 +02:00
Eric Biggers 1f66cef4a9
f2fs: move verity info pointer to fs-specific part of inode
Move the fsverity_info pointer into the filesystem-specific part of the
inode by adding the field f2fs_inode_info::i_verity_info and configuring
fsverity_operations::inode_info_offs accordingly.

This is a prerequisite for a later commit that removes
inode::i_verity_info, saving memory and improving cache efficiency on
filesystems that don't support fsverity.

Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://lore.kernel.org/20250810075706.172910-11-ebiggers@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-21 13:58:08 +02:00
Eric Biggers 7afb71ee92
f2fs: move crypt info pointer to fs-specific part of inode
Move the fscrypt_inode_info pointer into the filesystem-specific part of
the inode by adding the field f2fs_inode_info::i_crypt_info and
configuring fscrypt_operations::inode_info_offs accordingly.

This is a prerequisite for a later commit that removes
inode::i_crypt_info, saving memory and improving cache efficiency with
filesystems that don't support fscrypt.

Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://lore.kernel.org/20250810075706.172910-5-ebiggers@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-21 13:58:07 +02:00
Jaegeuk Kim 078cad8212 f2fs: drop inode from the donation list when the last file is closed
Let's drop the inode from the donation list when there is no other
open file.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-30 17:13:12 +00:00
Hongbo Li 94b3ce7f15 f2fs: switch to the new mount api
The new mount api will execute .parse_param, .init_fs_context, .get_tree
and will call .remount if remount happened. So we add the necessary
functions for the fs_context_operations. If .init_fs_context is added,
the old .mount should remove.

See Documentation/filesystems/mount_api.rst for more information.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[hongbo: context modified]
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:58:15 +00:00
Hongbo Li bb463a75ab f2fs: introduce fs_context_operation structure
The handle_mount_opt() helper is used to parse mount parameters,
and so we can rename this function to f2fs_parse_param() and set
it as .param_param in fs_context_operations.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:58:14 +00:00
Hongbo Li d185351325 f2fs: separate the options parsing and options checking
The new mount api separates option parsing and super block setup
into two distinct steps and so we need to separate the options
parsing out of the parse_options().

In order to achieve this, here we handle the mount options with
three steps:
  - Firstly, we move sb/sbi out of handle_mount_opt.
    As the former patch introduced f2fs_fs_context, so we record
    the changed mount options in this context. In handle_mount_opt,
    sb/sbi is null, so we should move all relative code out of
    handle_mount_opt (thus, some check case which use sb/sbi should
    move out).
  - Secondly, we introduce the some check helpers to keep the option
    consistent.
    During filling superblock period, sb/sbi are ready. So we check
    the f2fs_fs_context which holds the mount options base on sb/sbi.
  - Thirdly, we apply the new mount options to sb/sbi.
    After checking the f2fs_fs_context, all changed on mount options
    are valid. So we can apply them to sb/sbi directly.

After do these, option parsing and super block setting have been
decoupled. Also it should have retained the original execution
flow.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port, minor fixes and updates]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[hongbo: minor fixes]
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:58:14 +00:00
Hongbo Li 1a9094b10c f2fs: Add f2fs_fs_context to record the mount options
At the parsing phase of mouont in the new mount api, options
value will be recorded with the context, and then it will be
used in fill_super and other helpers.

Note that, this is a temporary status, we want remove the sb
and sbi usages in handle_mount_opt. So here the f2fs_fs_context
only records the mount options, it will be copied in sb/sbi in
later process. (At this point in the series, mount options are
temporarily not set during mount.)

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port, minor fixes and updates]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[hongbo: minor cleanup]
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:58:14 +00:00
Hongbo Li 19c4b380f2 f2fs: Allow sbi to be NULL in f2fs_printk
At the parsing phase of the new mount api, sbi will not be
available. So here allows sbi to be NULL in f2fs log helpers
and use that in handle_mount_opt().

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:58:14 +00:00
Hongbo Li 02eb5fe42a f2fs: move the option parser into handle_mount_opt
In handle_mount_opt, we use fs_parameter to parse each option.
However we're still using the old API to get the options string.
Using fsparams parse_options allows us to remove many of the Opt_
enums, so remove them.

The checkpoint disable cap (or percent) involves rather complex
parsing; we retain the old match_table mechanism for this, which
handles it well.

There are some changes about parsing options:
  1. For `active_logs`, `inline_xattr_size` and `fault_injection`,
     we use s32 type according the internal structure to record the
     option's value.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port, minor fixes and updates]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[hongbo: minor cleanup]
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:58:14 +00:00
Hongbo Li f2091cc188 f2fs: Add fs parameter specifications for mount options
Use an array of `fs_parameter_spec` called f2fs_param_specs to
hold the mount option specifications for the new mount api.

Add constant_table structures for several options to facilitate
parsing.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
[sandeen: forward port, minor fixes and updates, more fsparam_enum]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:58:13 +00:00
Jiazi Li e9705c61b1 f2fs: use kfree() instead of kvfree() to free some memory
options in f2fs_fill_super is alloc by kstrdup:
	options = kstrdup((const char *)data, GFP_KERNEL)
sit_bitmap[_mir], nat_bitmap[_mir] are alloc by kmemdup:
	sit_i->sit_bitmap = kmemdup(src_bitmap, sit_bitmap_size, GFP_KERNEL);
	sit_i->sit_bitmap_mir = kmemdup(src_bitmap,
					sit_bitmap_size, GFP_KERNEL);
	nm_i->nat_bitmap = kmemdup(version_bitmap, nm_i->bitmap_size,
					GFP_KERNEL);
	nm_i->nat_bitmap_mir = kmemdup(version_bitmap, nm_i->bitmap_size,
					GFP_KERNEL);
write_io is alloc by f2fs_kmalloc:
	sbi->write_io[i] = f2fs_kmalloc(sbi,
			array_size(n, sizeof(struct f2fs_bio_info))

Use kfree is more efficient.

Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com>
Signed-off-by: peixuan.qiu <peixuan.qiu@transsion.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-09 17:59:39 +00:00
Swarna Prabhu 1f13689026 f2fs: Fix the typos in comments
This patch fixes minor typos in comments in f2fs.

Signed-off-by: Swarna Prabhu <s.prabhu@samsung.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-06-24 21:34:37 +00:00
Chao Yu 59c1c89e9b f2fs: introduce reserved_pin_section sysfs entry
This patch introduces /sys/fs/f2fs/<dev>/reserved_pin_section for tuning
@needed parameter of has_not_enough_free_secs(), if we configure it w/
zero, it can avoid f2fs_gc() as much as possible while fallocating on
pinned file.

Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: wangzijie <wangzijie1@honor.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-06-23 22:13:02 +00:00
Sheng Yong 554d9b7242 f2fs: fix bio memleak when committing super block
When committing new super block, bio is allocated but not freed, and
kmemleak complains:

  unreferenced object 0xffff88801d185600 (size 192):
    comm "kworker/3:2", pid 128, jiffies 4298624992
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 80 67 c3 00 81 88 ff ff  .........g......
      01 08 06 00 00 00 00 00 00 00 00 00 01 00 00 00  ................
    backtrace (crc 650ecdb1):
      kmem_cache_alloc_noprof+0x3a9/0x460
      mempool_alloc_noprof+0x12f/0x310
      bio_alloc_bioset+0x1e2/0x7e0
      __f2fs_commit_super+0xe0/0x370
      f2fs_commit_super+0x4ed/0x8c0
      f2fs_record_error_work+0xc7/0x190
      process_one_work+0x7db/0x1970
      worker_thread+0x518/0xea0
      kthread+0x359/0x690
      ret_from_fork+0x34/0x70
      ret_from_fork_asm+0x1a/0x30

The issue can be reproduced by:

  mount /dev/vda /mnt
  i=0
  while :; do
      echo '[h]abc' > /sys/fs/f2fs/vda/extension_list
      echo '[h]!abc' > /sys/fs/f2fs/vda/extension_list
      echo scan > /sys/kernel/debug/kmemleak
      dmesg | grep "new suspected memory leaks"
      [ $? -eq 0 ] && break
      i=$((i + 1))
      echo "$i"
  done
  umount /mnt

Fixes: 5bcde45578 ("f2fs: get rid of buffer_head use")
Signed-off-by: Sheng Yong <shengyong1@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-06-23 22:13:01 +00:00
Zhiguo Niu a6c397a31f f2fs: use d_inode(dentry) cleanup dentry->d_inode
no logic changes.

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-28 16:05:22 +00:00
Chao Yu 54ca9be0bc f2fs: introduce FAULT_VMALLOC
Introduce a new fault type FAULT_VMALLOC to simulate no memory error in
f2fs_vmalloc().

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-27 23:52:36 +00:00
Chao Yu 5827e3c720 f2fs: add f2fs_bug_on() in f2fs_quota_read()
mapping_read_folio_gfp() will return a folio, it should always be
uptodate, let's check folio uptodate status to detect any potenial
bug.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-27 23:52:35 +00:00
Eric Biggers d005af3b67 f2fs: remove unused sbi argument from checksum functions
Since __f2fs_crc32() now calls crc32() directly, it no longer uses its
sbi argument.  Remove that, and simplify its callers accordingly.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-27 23:52:35 +00:00
Chao Yu a920196062 f2fs: don't over-report free space or inodes in statvfs
This fixes an analogus bug that was fixed in modern filesystems:
a) xfs in commit 4b8d867ca6 ("xfs: don't over-report free space or
inodes in statvfs")
b) ext4 in commit f87d3af741 ("ext4: don't over-report free space
or inodes in statvfs")
where statfs can report misleading / incorrect information where
project quota is enabled, and the free space is less than the
remaining quota.

This commit will resolve a test failure in generic/762 which tests
for this bug.

generic/762       - output mismatch (see /share/git/fstests/results//generic/762.out.bad)
    --- tests/generic/762.out   2025-04-15 10:21:53.371067071 +0800
    +++ /share/git/fstests/results//generic/762.out.bad 2025-05-13 16:13:37.000000000 +0800
    @@ -6,8 +6,10 @@
     root blocks2 is in range
     dir blocks2 is in range
     root bavail2 is in range
    -dir bavail2 is in range
    +dir bavail2 has value of 1539066
    +dir bavail2 is NOT in range 304734.87 .. 310891.13
     root blocks3 is in range
    ...
    (Run 'diff -u /share/git/fstests/tests/generic/762.out /share/git/fstests/results//generic/762.out.bad'  to see the entire diff)

HINT: You _MAY_ be missing kernel fix:
      XXXXXXXXXXXXXX xfs: don't over-report free space or inodes in statvfs

Cc: stable@kernel.org
Fixes: ddc34e328d ("f2fs: introduce f2fs_statfs_project")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-13 15:32:41 +00:00
Kairui Song 0427e811c9 f2fs: drop usage of folio_index
folio_index is only needed for mixed usage of page cache and swap
cache, for pure page cache usage, the caller can just use
folio->index instead.

It can't be a swap cache folio here.  Swap mapping may only call into fs
through `swap_rw` but f2fs does not use that method for swap.

Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org> (maintainer:F2FS FILE SYSTEM)
Cc: Chao Yu <chao@kernel.org> (maintainer:F2FS FILE SYSTEM)
Cc: linux-f2fs-devel@lists.sourceforge.net (open list:F2FS FILE SYSTEM)
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06 15:46:55 +00:00
Chao Yu 0244c77fed f2fs: support FAULT_TIMEOUT
Support to inject a timeout fault into function, currently it only
support to inject timeout to commit_atomic_write flow to reproduce
inconsistent bug, like the bug fixed by commit f098aeba04 ("f2fs:
fix to avoid atomicity corruption of atomic file").

By default, the new type fault will inject 1000ms timeout, and the
timeout process can be interrupted by SIGKILL.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-06 15:46:55 +00:00
Chao Yu dc6d9ef57f f2fs: zone: fix to calculate first_zoned_segno correctly
A zoned device can has both conventional zones and sequential zones,
so we should not treat first segment of zoned device as first_zoned_segno,
instead, we need to check zone type for each zone during traversing zoned
device to find first_zoned_segno.

Otherwise, for below case, first_zoned_segno will be 0, which could be
wrong.

create_null_blk 512 2 1024 1024
mkfs.f2fs -m /dev/nullb0

Testcase:

export SCRIPTS_PATH=/share/git/scripts

test multiple devices w/ zoned device
for ((i=0;i<8;i++)) do {
	zonesize=$((2<<$i))
	conzone=$((4096/$zonesize))
	seqzone=$((4096/$zonesize))
	$SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone
	mkfs.f2fs -f -m /dev/vdb -c /dev/nullb0
	mount /dev/vdb /mnt/f2fs
	touch /mnt/f2fs/file
	f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2))
	stat /mnt/f2fs/file
	df
	cat /proc/fs/f2fs/vdb/segment_info
	umount /mnt/f2fs
	$SCRIPTS_PATH/nullblk_remove.sh 0
} done

test single zoned device
for ((i=0;i<8;i++)) do {
	zonesize=$((2<<$i))
	conzone=$((4096/$zonesize))
	seqzone=$((4096/$zonesize))
	$SCRIPTS_PATH/nullblk_create.sh 512 $zonesize $conzone $seqzone
	mkfs.f2fs -f -m /dev/nullb0
	mount /dev/nullb0 /mnt/f2fs
	touch /mnt/f2fs/file
	f2fs_io pinfile set /mnt/f2fs/file $((8589934592*2))
	stat /mnt/f2fs/file
	df
	cat /proc/fs/f2fs/nullb0/segment_info
	umount /mnt/f2fs
	$SCRIPTS_PATH/nullblk_remove.sh 0
} done

Fixes: 9703d69d9d ("f2fs: support file pinning for zoned devices")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:48 +00:00
Chao Yu 5db0d252c6 f2fs: fix to do sanity check on sit_bitmap_size
w/ below testcase, resize will generate a corrupted image which
contains inconsistent metadata, so when mounting such image, it
will trigger kernel panic:

touch img
truncate -s $((512*1024*1024*1024)) img
mkfs.f2fs -f img $((256*1024*1024))
resize.f2fs -s -i img -t $((1024*1024*1024))
mount img /mnt/f2fs

------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.h:863!
Oops: invalid opcode: 0000 [#1] SMP PTI
CPU: 11 UID: 0 PID: 3922 Comm: mount Not tainted 6.15.0-rc1+ #191 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:f2fs_ra_meta_pages+0x47c/0x490

Call Trace:
 f2fs_build_segment_manager+0x11c3/0x2600
 f2fs_fill_super+0xe97/0x2840
 mount_bdev+0xf4/0x140
 legacy_get_tree+0x2b/0x50
 vfs_get_tree+0x29/0xd0
 path_mount+0x487/0xaf0
 __x64_sys_mount+0x116/0x150
 do_syscall_64+0x82/0x190
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fdbfde1bcfe

The reaseon is:

sit_i->bitmap_size is 192, so size of sit bitmap is 192*8=1536, at maximum
there are 1536 sit blocks, however MAIN_SEGS is 261893, so that sit_blk_cnt
is 4762, build_sit_entries() -> current_sit_addr() tries to access
out-of-boundary in sit_bitmap at offset from [1536, 4762), once sit_bitmap
and sit_bitmap_mirror is not the same, it will trigger f2fs_bug_on().

Let's add sanity check in f2fs_sanity_check_ckpt() to avoid panic.

Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:48 +00:00
Matthew Wilcox (Oracle) 0d1e687e43 f2fs: Use a folio in f2fs_quota_read()
Support arbitrary size folios and remove a few hidden calls to
compound_head().  Also remove an unnecessary test of the uptodaate flag;
if mapping_read_folio_gfp() cannot bring the folio uptodate, it will
return an error.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:21:35 +00:00
Chao Yu db03c20c08 f2fs: fix to set atomic write status more clear
1. After we start atomic write in a database file, before committing
all data, we'd better not set inode w/ vfs dirty status to avoid
redundant updates, instead, we only set inode w/ atomic dirty status.

2. After we commit all data, before committing metadata, we need to
clear atomic dirty status, and set vfs dirty status to allow vfs flush
dirty inode.

Cc: Daeho Jeong <daehojeong@google.com>
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:58 +00:00
Chao Yu 2be96c2147 f2fs: fix to update injection attrs according to fault_option
When we update inject type via sysfs, it shows wrong rate value as
below, there is a same problem when we update inject rate, fix it.

Before:
F2FS-fs (vdd): build fault injection attr: rate: 0, type: 0xffff
F2FS-fs (vdd): build fault injection attr: rate: 1, type: 0x0

After:
F2FS-fs (vdd): build fault injection type: 0x1
F2FS-fs (vdd): build fault injection rate: 1

Meanwhile, let's avoid turning on all fault types when we enable fault
injection via fault_injection mount option, it will lead to shutdown
filesystem or fail the mount() easily.

mount -o fault_injection=4 /dev/vdd /mnt/f2fs
F2FS-fs (vdd): build fault injection attr: rate: 4, type: 0x7fffff
F2FS-fs (vdd): inject kmalloc in f2fs_kmalloc of f2fs_fill_super+0xbdf/0x27c0

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:58 +00:00
Chao Yu e073e92789 f2fs: add a proc entry show inject stats
This patch adds a proc entry named inject_stats to show total injected
count for each fault type.

cat /proc/fs/f2fs/<dev>/inject_stats
fault_type              injected_count
kmalloc                 0
kvmalloc                0
page alloc              0
page get                0
alloc bio(obsolete)     0
alloc nid               0
orphan                  0
no more block           0
too big dir depth       0
evict_inode fail        0
truncate fail           0
read IO error           0
checkpoint error        0
discard error           0
write IO error          0
slab alloc              0
dquot initialize        0
lock_op                 0
invalid blkaddr         0
inconsistent blkaddr    0
no free segment         0
inconsistent footer     0

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:57 +00:00
Yeongjin Gil f098aeba04 f2fs: fix to avoid atomicity corruption of atomic file
In the case of the following call stack for an atomic file,
FI_DIRTY_INODE is set, but FI_ATOMIC_DIRTIED is not subsequently set.

f2fs_file_write_iter
  f2fs_map_blocks
    f2fs_reserve_new_blocks
      inc_valid_block_count
        __mark_inode_dirty(dquot)
          f2fs_dirty_inode

If FI_ATOMIC_DIRTIED is not set, atomic file can encounter corruption
due to a mismatch between old file size and new data.

To resolve this issue, I changed to set FI_ATOMIC_DIRTIED when
FI_DIRTY_INODE is set. This ensures that FI_DIRTY_INODE, which was
previously cleared by the Writeback thread during the commit atomic, is
set and i_size is updated.

Cc: <stable@vger.kernel.org>
Fixes: fccaa81de8 ("f2fs: prevent atomic file from being dirtied before commit")
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-17 17:38:33 +00:00
Eric Sandeen 71e9bd3d5c f2fs: pass sbi rather than sb to parse_options()
With the new mount API the sb will not be available during initial option
parsing, which will happen before fill_super reads sb from disk.

Now that the sb is no longer directly referenced in parse_options, switch
it to use sbi.

(Note that all calls to f2fs_sb_has_* originating from parse_options will
need to be deferred to later before we can use the new mount API.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:07 +00:00
Eric Sandeen b7de231b9d f2fs: pass sbi rather than sb to quota qf_name helpers
With the new mount api we will not have the superblock available during
option parsing. Prepare for this by passing *sbi rather than *sb.

For now, we are parsing after fill_super has been done, so sbi->sb will
exist. Under the new mount API this will require more care, but do the
simple change for now.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen 9cca498759 f2fs: defer readonly check vs norecovery
Defer the readonly-vs-norecovery check until after option parsing is done
so that option parsing does not require an active superblock for the test.
Add a helpful message, while we're at it.

(I think could be moved back into parsing after we switch to the new mount
API if desired, as the fs context will have RO state available.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen 0edcb2197e f2fs: Pass sbi rather than sb to f2fs_set_test_dummy_encryption
This removes another sb instance from parse_options()

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen 9100adf326 f2fs: make LAZYTIME a mount option flag
Set LAZYTIME into sbi during parsing, and transfer it to the sb in
fill_super, so that an sb is not required during option parsing.

(Note: While lazytime is normally handled via mount flag in the vfs,
some f2fs users do expect to be able to use it as an explicit mount
option string via the mount syscall, so this option must remain.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen 7d6ee50330 f2fs: make INLINECRYPT a mount option flag
Set INLINECRYPT into sbi during parsing, and transfer it to the sb in
fill_super, so that an sb is not required during option parsing.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen abd0e040e9 f2fs: factor out an f2fs_default_check function
The current options parsing function both parses options and validates
them - factor the validation out to reduce the size of the function and
make transition to the new mount API possible, because under the new mount
API, options are parsed one at a time, and cannot all be tested at the end
of the parsing function.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen 277352b6cb f2fs: consolidate unsupported option handling errors
When certain build-time options are disabled, some mount options are not
accepted. For quota and compression, all related options are dismissed
with a single error message. For xattr, acl, and fault injection, each
option is handled individually. In addition, inline_xattr_size was missed
when CONFIG_F2FS_FS_XATTR was disabled.

Collapse xattr, acl, and fault injection errors into a single string, for
simplicity, and handle the missing inline_xattr_size case.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Eric Sandeen 64ee7503cb f2fs: use f2fs_sb_has_device_alias during option parsing
Rather than using F2FS_HAS_FEATURE directly, use f2fs_sb_has_device_alias
macro during option parsing for consistency.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:16:06 +00:00
Daeho Jeong d7b549def0 f2fs: add carve_out sysfs node
For several zoned storage devices, vendors will provide extra space
which was used for device level GC than specs and F2FS can use this
space for filesystem level GC. To do that, we can reserve the space
using reserved_blocks. However, it is not enough, since this extra
space should not be shown to users. So, with this new sysfs node,
we can hide the space by substracting reserved_blocks from total
bytes.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-13 18:15:59 +00:00
Chao Yu 1788971e0b f2fs: introduce FAULT_INCONSISTENT_FOOTER
To simulate inconsistent node footer error.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 03:25:53 +00:00
Chao Yu c2ecba0265 f2fs: control nat_bits feature via mount option
Introduce a new mount option "nat_bits" to control nat_bits feature,
by default nat_bits feature is disabled.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-08 16:04:10 +00:00
Chao Yu d8f5b91d77 f2fs: fix to call f2fs_recover_quota_end() correctly
f2fs_recover_quota_begin() and f2fs_recover_quota_end() should be called
in pair, there is some cases we may skip calling f2fs_recover_quota_end(),
fix it.

Fixes: e1bb7d3d9c ("f2fs: fix to recover quota data correctly")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-04 00:47:13 +00:00
Jaegeuk Kim 201e07aec6 f2fs: fix the missing write pointer correction
If checkpoint was disabled, we missed to fix the write pointers.

Cc: <stable@vger.kernel.org>
Fixes: 1015035609 ("f2fs: fix changing cursegs if recovery fails on zoned device")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-28 16:23:01 +00:00
Jaegeuk Kim ef0c333cad f2fs: keep POSIX_FADV_NOREUSE ranges
This patch records POSIX_FADV_NOREUSE ranges for users to reclaim the caches
instantly off from LRU.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-13 17:58:31 +00:00
Chao Yu 4f91f07470 f2fs: add dump_stack() in f2fs_handle_critical_error()
To show call stack, so that we can see who causes critical error, note
that it won't call dump_stack() for shutdown path.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-12 02:28:22 +00:00
Chao Yu eb85c2410d f2fs: quota: fix to avoid warning in dquot_writeback_dquots()
F2FS-fs (dm-59): checkpoint=enable has some unwritten data.

------------[ cut here ]------------
WARNING: CPU: 6 PID: 8013 at fs/quota/dquot.c:691 dquot_writeback_dquots+0x2fc/0x308
pc : dquot_writeback_dquots+0x2fc/0x308
lr : f2fs_quota_sync+0xcc/0x1c4
Call trace:
dquot_writeback_dquots+0x2fc/0x308
f2fs_quota_sync+0xcc/0x1c4
f2fs_write_checkpoint+0x3d4/0x9b0
f2fs_issue_checkpoint+0x1bc/0x2c0
f2fs_sync_fs+0x54/0x150
f2fs_do_sync_file+0x2f8/0x814
__f2fs_ioctl+0x1960/0x3244
f2fs_ioctl+0x54/0xe0
__arm64_sys_ioctl+0xa8/0xe4
invoke_syscall+0x58/0x114

checkpoint and f2fs_remount may race as below, resulting triggering warning
in dquot_writeback_dquots().

atomic write                                    remount
                                                - do_remount
                                                 - down_write(&sb->s_umount);
                                                  - f2fs_remount
- ioctl
 - f2fs_do_sync_file
  - f2fs_sync_fs
   - f2fs_write_checkpoint
    - block_operations
     - locked = down_read_trylock(&sbi->sb->s_umount)
       : fail to lock due to the write lock was held by remount
                                                 - up_write(&sb->s_umount);
     - f2fs_quota_sync
      - dquot_writeback_dquots
       - WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount))
       : trigger warning because s_umount lock was unlocked by remount

If checkpoint comes from mount/umount/remount/freeze/quotactl, caller of
checkpoint has already held s_umount lock, calling dquot_writeback_dquots()
in the context should be safe.

So let's record task to sbi->umount_lock_holder, so that checkpoint can
know whether the lock has held in the context or not by checking current
w/ it.

In addition, in order to not misrepresent caller of checkpoint, we should
not allow to trigger async checkpoint for those callers: mount/umount/remount/
freeze/quotactl.

Fixes: af033b2aa8 ("f2fs: guarantee journalled quota data by checkpoint")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-10 16:58:42 +00:00
Eric Biggers 3ca4bec40e f2fs: switch to using the crc32 library
Now that the crc32() library function takes advantage of
architecture-specific optimizations, it is unnecessary to go through the
crypto API.  Just use crc32().  This is much simpler, and it improves
performance due to eliminating the crypto API overhead.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-19-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-01 17:23:02 -08:00
Chao Yu bc8aeb04fd f2fs: fix to drop all discards after creating snapshot on lvm device
Piergiorgio reported a bug in bugzilla as below:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330
RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs]
Call Trace:
 __issue_discard_cmd+0x1ca/0x350 [f2fs]
 issue_discard_thread+0x191/0x480 [f2fs]
 kthread+0xcf/0x100
 ret_from_fork+0x31/0x50
 ret_from_fork_asm+0x1a/0x30

w/ below testcase, it can reproduce this bug quickly:
- pvcreate /dev/vdb
- vgcreate myvg1 /dev/vdb
- lvcreate -L 1024m -n mylv1 myvg1
- mount /dev/myvg1/mylv1 /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20
- sync
- rm /mnt/f2fs/file
- sync
- lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1
- umount /mnt/f2fs

The root cause is: it will update discard_max_bytes of mounted lvm
device to zero after creating snapshot on this lvm device, then,
__submit_discard_cmd() will pass parameter @nr_sects w/ zero value
to __blkdev_issue_discard(), it returns a NULL bio pointer, result
in panic.

This patch changes as below for fixing:
1. Let's drop all remained discards in f2fs_unfreeze() if snapshot
of lvm device is created.
2. Checking discard_max_bytes before submitting discard during
__submit_discard_cmd().

Cc: stable@vger.kernel.org
Fixes: 35ec7d5748 ("f2fs: split discard command in prior to block layer")
Reported-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-23 15:48:15 +00:00