Commit Graph

2861 Commits (09cfd3c52ea76f43b3cb15e570aeddf633d65e80)

Author SHA1 Message Date
Miklos Szeredi 5ef7bcdeec ovl: relax redirect/metacopy requirements for lower -> data redirect
Allow the special case of a redirect from a lower layer to a data layer
without having to turn on metacopy.  This makes the feature work with
userxattr, which in turn allows data layers to be usable in user
namespaces.

Minimize the risk by only enabling redirect from a single lower layer to a
data layer iff a data layer is specified.  The only way to access a data
layer is to enable this, so there's really no reason not to enable this.

This can be used safely if the lower layer is read-only and the
user.overlay.redirect xattr cannot be modified.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-04-30 10:55:27 +02:00
Zijun Hu d1f482108a
fs/fs_parse: Remove unused and problematic validate_constant_table()
Remove validate_constant_table() since:

- It has no caller.

- It has below 3 bugs for good constant table array array[] which must
  end with a empty entry, and take below invocation for explaination:
  validate_constant_table(array, ARRAY_SIZE(array), ...)

  - Always return wrong value due to the last empty entry.
  - Imprecise error message for missorted case.
  - Potential NULL pointer dereference since the last pr_err() may use
    @tbl[i].name NULL pointer to print the last empty entry's name.

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/20250415-fix_fs-v4-1-5d575124a3ff@quicinc.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-21 10:27:59 +02:00
Zijun Hu 296b67059e
fs/fs_parse: Delete macro fsparam_u32hex()
Delete macro fsparam_u32hex() since:

- it has no caller.

- it uses as type @fs_param_is_u32_hex which is never defined, so will
  cause compile error when caller uses it.

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/20250411-fix_fs-v2-1-5d3395c102e4@quicinc.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-21 10:27:58 +02:00
Ritesh Harjani (IBM) 336bac5e08
Documentation: iomap: Add missing flags description
Let's document the use of these flags in iomap design doc where other
flags are defined too -

- IOMAP_F_BOUNDARY was added by XFS to prevent merging of I/O and I/O
  completions across RTG boundaries.
- IOMAP_F_ATOMIC_BIO was added for supporting atomic I/O operations
  for filesystems to inform the iomap that it needs HW-offload based
  mechanism for torn-write protection.

While we are at it, let's also fix the description of IOMAP_F_PRIVATE
flag after a recent:
commit 923936efeb ("iomap: Fix conflicting values of iomap flags")

Signed-off-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Link: https://lore.kernel.org/8d8534a704c4f162f347a84830710db32a927b2e.1744432270.git.ritesh.list@gmail.com
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-15 10:30:48 +02:00
Linus Torvalds 5aaaedb0cb A few more miscellaneous ext4 bug fixes and cleanups including some
syzbot failures and fixing a stale file handing refeencing an inode
 previously used as a regular file, but which has been deleted and
 reused as an ea_inode would result in ext4 erroneously consider this a
 case of fs corrupotion.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmf7r3YACgkQ8vlZVpUN
 gaPl9QgApwE5BAQdO6miW0sDMPj5b4sMc25aG4OPlfKhFqiIJB0Ub4zC2n0OFnaf
 HXk8P5oVeepH9ciTnYFF30X20Ythzjwmd9j5eyq2wsfYASQUjfcvmR9WovbqZtGQ
 3Zerd9QFp7SvZa+K4sADBhEb/7HAnxDGfiqSQptY6WQTwD+it1bnuhmzG0m6AH4m
 R1ItREDx7D2QrudDToFBd8XQ+FgRETZ8Qrs7PqIznw/dBNMdHRnAiw2eiyuoPU/S
 T8cmCxii3Z9sJ6LtohKYuWOmOmdxg951V5ZcekVRuaFSljSUsRsIplO7OlaMvQDs
 9vGVKiiZLdU2B0Wd90IeQUdJmP4xPg==
 =I8qx
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "A few more miscellaneous ext4 bug fixes and cleanups including some
  syzbot failures and fixing a stale file handing refeencing an inode
  previously used as a regular file, but which has been deleted and
  reused as an ea_inode would result in ext4 erroneously considering
  this a case of fs corruption"

* tag 'ext4_for_linus-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix off-by-one error in do_split
  ext4: make block validity check resistent to sb bh corruption
  ext4: avoid -Wflex-array-member-not-at-end warning
  Documentation: ext4: Add fields to ext4_super_block documentation
  ext4: don't treat fhandle lookup of ea_inode as FS corruption
2025-04-13 07:15:50 -07:00
Tom Vierjahn ce7e8a65aa Documentation: ext4: Add fields to ext4_super_block documentation
Documentation and implementation of the ext4 super block have
slightly diverged: Padding has been removed in order to make room for
new fields that are still missing in the documentation.

Add the new fields s_encryption_level, s_first_error_errorcode,
s_last_error_errorcode to the documentation of the ext4 super block.

Fixes: f542fbe8d5 ("ext4 crypto: reserve codepoints used by the ext4 encryption feature")
Fixes: 878520ac45 ("ext4: save the error code which triggered an ext4_error() in the superblock")

Signed-off-by: Tom Vierjahn <tom.vierjahn@acm.org>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20250324221004.5268-1-tom.vierjahn@acm.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-04-12 21:50:41 -04:00
David Howells f1745496d3
netfs: Update main API document
Bring the netfs documentation up to date.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/1690127.1744208325@warthog.procyon.org.uk
Reviewed-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Viacheslav Dubeyko <slava@dubeyko.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Timothy Day <timday@amazon.com>
cc: Jonathan Corbet <corbet@lwn.net>
cc: netfs@lists.linux.dev
cc: linux-doc@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-11 15:23:50 +02:00
Eric Biggers c07d3aede2 fscrypt: add support for hardware-wrapped keys
Add support for hardware-wrapped keys to fscrypt.  Such keys are
protected from certain attacks, such as cold boot attacks.  For more
information, see the "Hardware-wrapped keys" section of
Documentation/block/inline-encryption.rst.

To support hardware-wrapped keys in fscrypt, we allow the fscrypt master
keys to be hardware-wrapped.  File contents encryption is done by
passing the wrapped key to the inline encryption hardware via
blk-crypto.  Other fscrypt operations such as filenames encryption
continue to be done by the kernel, using the "software secret" which the
hardware derives.  For more information, see the documentation which
this patch adds to Documentation/filesystems/fscrypt.rst.

Note that this feature doesn't require any filesystem-specific changes.
However it does depend on inline encryption support, and thus currently
it is only applicable to ext4 and f2fs.

The version of this feature introduced by this patch is mostly
equivalent to the version that has existed downstream in the Android
Common Kernels since 2020.  However, a couple fixes are included.
First, the flags field in struct fscrypt_add_key_arg is now placed in
the proper location.  Second, key identifiers for HW-wrapped keys are
now derived using a distinct HKDF context byte; this fixes a bug where a
raw key could have the same identifier as a HW-wrapped key.  Note that
as a result of these fixes, the version of this feature introduced by
this patch is not UAPI or on-disk format compatible with the version in
the Android Common Kernels, though the divergence is limited to just
those specific fixes.  This version should be used going forwards.

This patch has been heavily rewritten from the original version by
Gaurav Kashyap <quic_gaurkash@quicinc.com> and
Barani Muthukumaran <bmuthuku@codeaurora.org>.

Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sm8650
Link: https://lore.kernel.org/r/20250404225859.172344-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-04-08 19:32:11 -07:00
NeilBrown 06c567403a
Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS
try_lookup_noperm() and d_hash_and_lookup() are nearly identical.  The
former does some validation of the name where the latter doesn't.
Outside of the VFS that validation is likely valuable, and having only
one exported function for this task is certainly a good idea.

So make d_hash_and_lookup() local to VFS files and change all other
callers to try_lookup_noperm().  Note that the arguments are swapped.

Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250319031545.2999807-6-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-08 11:24:41 +02:00
NeilBrown fa6fe07d15
VFS: rename lookup_one_len family to lookup_noperm and remove permission check
The lookup_one_len family of functions is (now) only used internally by
a filesystem on itself either
- in a context where permission checking is irrelevant such as by a
  virtual filesystem populating itself, or xfs accessing its ORPHANAGE
  or dquota accessing the quota file; or
- in a context where a permission check (MAY_EXEC on the parent) has just
  been performed such as a network filesystem finding in "silly-rename"
  file in the same directory.  This is also the context after the
  _parentat() functions where currently lookup_one_qstr_excl() is used.

So the permission check is pointless.

The name "one_len" is unhelpful in understanding the purpose of these
functions and should be changed.  Most of the callers pass the len as
"strlen()" so using a qstr and QSTR() can simplify the code.

This patch renames these functions (include lookup_positive_unlocked()
which is part of the family despite the name) to have a name based on
"lookup_noperm".  They are changed to receive a 'struct qstr' instead
of separate name and len.  In a few cases the use of QSTR() results in a
new call to strlen().

try_lookup_noperm() takes a pointer to a qstr instead of the whole
qstr.  This is consistent with d_hash_and_lookup() (which is nearly
identical) and useful for lookup_noperm_unlocked().

The new lookup_noperm_common() doesn't take a qstr yet.  That will be
tidied up in a subsequent patch.

Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-5-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-08 11:24:36 +02:00
Matthew Wilcox (Oracle) 6b0dfabb35
fs: Remove aops->writepage
All callers and implementations are now removed, so remove the operation
and update the documentation to match.

Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250402150005.2309458-10-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 09:36:50 +02:00
NeilBrown 5741909697
VFS: improve interface for lookup_one functions
The family of functions:
  lookup_one()
  lookup_one_unlocked()
  lookup_one_positive_unlocked()

appear designed to be used by external clients of the filesystem rather
than by filesystems acting on themselves as the lookup_one_len family
are used.

They are used by:
   btrfs/ioctl - which is a user-space interface rather than an internal
     activity
   exportfs - i.e. from nfsd or the open_by_handle_at interface
   overlayfs - at access the underlying filesystems
   smb/server - for file service

They should be used by nfsd (more than just the exportfs path) and
cachefs but aren't.

It would help if the documentation didn't claim they should "not be
called by generic code".

Also the path component name is passed as "name" and "len" which are
(confusingly?) separate by the "base".  In some cases the len in simply
"strlen" and so passing a qstr using QSTR() would make the calling
clearer.
Other callers do pass separate name and len which are stored in a
struct.  Sometimes these are already stored in a qstr, other times it
easily could be.

So this patch changes these three functions to receive a 'struct qstr *',
and improves the documentation.

QSTR_LEN() is added to make it easy to pass a QSTR containing a known
len.

[brauner@kernel.org: take a struct qstr pointer]
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-2-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07 09:25:32 +02:00
Linus Torvalds bdafff62ae 9p update for 6.15-rc1
- fix handling of bogus (negative/too long) replies
 - fix crash on mkdir with ACLs
 (... looks like nobody is using ACLs with semi-recent kernels...)
 - ipv6 support for trans=tcp
 - minor concurrency fix to make syzbot happy
 - minor cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmfuAoIACgkQq06b7GqY
 5nBb9w/+K9WnU4MdSTFSXDJ+ZZTY//fPpFaUTqHl1hTeRjmIBtBdngy9ASvnPrPj
 n6DHnd+qkdFV6cMvs5wPUskRxJZuRDugzZMAd6yzjJoRNPmNFN2Ux7EXWEdFwvFG
 mk4EJtzgiZhp7XWlNzQeMziuDmMZJijzLsd4zVYNo9fNKEh5jLKjKWyHTVRxfuCc
 i22Y8oUgcghK0YSSLoL59xF4nRrvn57DBF3wnrW6pqVvVQ05NJRH4fNgXp4wW497
 jxQq01ela7IgNUoMgib7F0ov1fu8pSEd95T+fzcqynZCePQ9rzDbvt3MR7rjJuqo
 /VXwW7N3KT6DrQG6Wu21B9VcfBeWjdbtJ/GWGVp8d2iP04Sv0escx53qETZSD0iZ
 pMIZLthJuXlq9dmxZ/j+BPLlbm7uAFPbP15/O9Un5xVvrisANFm1TPvM77btnrEP
 KovWfooheoUrK6DmkKbkzS5HJH2ko4CASAG7c8GL+R1hXwVDswC06cecyvXaKQQK
 Um4nOe59hRqbqWXmIEs4jssoUjfg8MfuX71DvX0p6+r1WR+eySieG2HiTz/mTj0q
 /27cCWlAvjYxa42opxASAD1/HvW2tZfcPKtSQbh/3s0FBpTVqbof3fxmnTjcb0Po
 V7WpuRSD7DnmawjbQQLXznUQokagO23/ySO1vARnluKyGwsn5yI=
 =Q0mE
 -----END PGP SIGNATURE-----

Merge tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux

Pull 9p updates from Dominique Martinet:

 - fix handling of bogus (negative/too long) replies

 - fix crash on mkdir with ACLs (... looks like nobody is using ACLs
   with semi-recent kernels...)

 - ipv6 support for trans=tcp

 - minor concurrency fix to make syzbot happy

 - minor cleanup

* tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux:
  docs: fs/9p: Add missing "not" in cache documentation
  9p: Use hashtable.h for hash_errmap
  Documentation/fs/9p: fix broken link
  9p/trans_fd: mark concurrent read and writes to p9_conn->err
  9p/net: return error on bogus (longer than requested) replies
  9p/net: fix improper handling of bogus negative read/write replies
  fs/9p: fix NULL pointer dereference on mkdir
  net/9p/fd: support ipv6 for trans=tcp
2025-04-03 15:35:46 -07:00
Tingmao Wang 4210030d8b docs: fs/9p: Add missing "not" in cache documentation
A quick fix for what I assume is a typo.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Message-ID: <20250330213443.98434-1-m@maowtm.org>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2025-04-03 12:31:11 +09:00
Linus Torvalds d6b02199cd - The 7 patch series "powerpc/crash: use generic crashkernel
reservation" from Sourabh Jain changes powerpc's kexec code to use more
   of the generic layers.
 
 - The 2 patch series "get_maintainer: report subsystem status
   separately" from Vlastimil Babka makes some long-requested improvements
   to the get_maintainer output.
 
 - The 4 patch series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the ucount
   code.
 
 - The 12 patch series "reboot: support runtime configuration of
   emergency hw_protection action" from Ahmad Fatoum improves the ability
   for a driver to perform an emergency system shutdown or reboot.
 
 - The 16 patch series "Converge on using secs_to_jiffies() part two"
   from Easwar Hariharan performs further migrations from
   msecs_to_jiffies() to secs_to_jiffies().
 
 - The 7 patch series "lib/interval_tree: add some test cases and
   cleanup" from Wei Yang permits more userspace testing of kernel library
   code, adds some more tests and performs some cleanups.
 
 - The 2 patch series "hung_task: Dump the blocking task stacktrace" from
   Masami Hiramatsu arranges for the hung_task detector to dump the stack
   of the blocking task and not just that of the blocked task.
 
 - The 4 patch series "resource: Split and use DEFINE_RES*() macros" from
   Andy Shevchenko provides some cleanups to the resource definition
   macros.
 
 - Plus the usual shower of singleton patches - please see the individual
   changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nuqwAKCRDdBJ7gKXxA
 jtNqAQDxqJpjWkzn4yN9CNSs1ivVx3fr6SqazlYCrt3u89WQvwEA1oRrGpETzUGq
 r6khQUIcQImPPcjFqEFpuiSOU0MBZA0=
 =Kii8
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
2025-04-01 10:06:52 -07:00
Linus Torvalds eb0ece1602 - The 6 patch series "Enable strict percpu address space checks" from
Uros Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.
 
   This has caused a small amount of fallout - two or three issues were
   reported.  In all cases the calling code was founf to be incorrect.
 
 - The 4 patch series "Some cleanup for memcg" from Chen Ridong
   implements some relatively monir cleanups for the memcontrol code.
 
 - The 17 patch series "mm: fixes for device-exclusive entries (hmm)"
   from David Hildenbrand fixes a boatload of issues which David found then
   using device-exclusive PTE entries when THP is enabled.  More work is
   needed, but this makes thins better - our own HMM selftests now succeed.
 
 - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry
   Ahmed remove the z3fold and zbud implementations.  They have been
   deprecated for half a year and nobody has complained.
 
 - The 5 patch series "mm: further simplify VMA merge operation" from
   Lorenzo Stoakes implements numerous simplifications in this area.  No
   runtime effects are anticipated.
 
 - The 4 patch series "mm/madvise: remove redundant mmap_lock operations
   from process_madvise()" from SeongJae Park rationalizes the locking in
   the madvise() implementation.  Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.
 
 - The 12 patch series "Tiny cleanup and improvements about SWAP code"
   from Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.
 
 - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak user-visible
   output.
 
 - The 2 patch series "mm/damon/paddr: fix large folios access and
   schemes handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.
 
 - The 3 patch series "mm/damon/core: fix wrong and/or useless
   damos_walk() behaviors" from SeongJae Park fixes a few issues with the
   accuracy of kdamond's walking of DAMON regions.
 
 - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from
   Lorenzo Stoakes changes the interaction between framebuffer deferred-io
   and core MM.  No functional changes are anticipated - this is
   preparatory work for the future removal of page structure fields.
 
 - The 4 patch series "mm/damon: add support for hugepage_size DAMOS
   filter" from Usama Arif adds a DAMOS filter which permits the filtering
   by huge page sizes.
 
 - The 4 patch series "mm: permit guard regions for file-backed/shmem
   mappings" from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state.  The feature now covers shmem and
   file-backed mappings.
 
 - The 4 patch series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping for
   pte-mapped large folios.
 
 - The 18 patch series "reimplement per-vma lock as a refcount" from
   Suren Baghdasaryan puts the vm_lock back into the vma.  Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy.  This patchset provides small (0-10%) improvements on one
   microbenchmark.
 
 - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation
   fixes and improves" from SeongJae Park does some maintenance work on the
   DAMON docs.
 
 - The 27 patch series "hugetlb/CMA improvements for large systems" from
   Frank van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped
   pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.
 
 - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.
 
 - The 12 patch series "selftests/mm: Some cleanups from trying to run
   them" from Brendan Jackman fixes a pile of unrelated issues which
   Brendan encountered while runnimg our selftests.
 
 - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap"
   from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.
 
 - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply wasn't
   being effective.
 
 - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)"
   from David Hildenbrand implements a number of unrelated cleanups in this
   code.
 
 - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman
   Khandual implements a number of preparatoty cleanups to the
   GENERIC_PTDUMP Kconfig logic.
 
 - The 8 patch series "mm/damon: auto-tune aggregation interval" from
   SeongJae Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.
 
 - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some
   issues in powerpc, sparc and x86 lazy MMU implementations.  Ryan did
   this in preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.
 
 - The 2 patch series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the code
   easier to follow.
 
 - The 3 patch series "page_counter cleanup and size reduction" from
   Shakeel Butt cleans up the page_counter code and fixes a size increase
   which we accidentally added late last year.
 
 - The 3 patch series "Add a command line option that enables control of
   how many threads should be used to allocate huge pages" from Thomas
   Prescher does that.  It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.
 
 - The 3 patch series "Fix calculations in trace_balance_dirty_pages()
   for cgwb" from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.
 
 - The 9 patch series "mm/damon: make allow filters after reject filters
   useful and intuitive" from SeongJae Park improves the handling of allow
   and reject filters.  Behaviour is made more consistent and the
   documention is updated accordingly.
 
 - The 5 patch series "Switch zswap to object read/write APIs" from Yosry
   Ahmed updates zswap to the new object read/write APIs and thus permits
   the removal of some legacy code from zpool and zsmalloc.
 
 - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang
   does as it claims.
 
 - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts"
   from Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.
 
 - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes
   is a preparatoty refactoring and cleanup of the mremap() code.
 
 - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb)
   + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.
 
 - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS
   filters based on handling layers" from SeongJae Park adds a couple of
   new sysfs directories to ease the management of DAMON/DAMOS filters.
 
 - The 13 patch series "arch, mm: reduce code duplication in mem_init()"
   from Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.
 
 - The 13 patch series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.
 
 - The 3 patch series "mm: page_ext: Introduce new iteration API" from
   Luiz Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.
 
 - The 8 patch series "Buddy allocator like (or non-uniform) folio split"
   from Zi Yan reworks the code to split a folio into smaller folios.  The
   main benefit is lessened memory consumption: fewer post-split folios are
   generated.
 
 - The 2 patch series "Minimize xa_node allocation during xarry split"
   from Zi Yan reduces the number of xarray xa_nodes which are generated
   during an xarray split.
 
 - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.
 
 - The 3 patch series "Add tracepoints for lowmem reserves, watermarks
   and totalreserve_pages" from Martin Liu adds some more tracepoints to
   the page allocator code.
 
 - The 4 patch series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which SeongJae
   observed during his earlier madvise work.
 
 - The 3 patch series "mm/hwpoison: Fix regressions in memory failure
   handling" from Shuai Xue addresses two quite serious regressions which
   Shuai has observed in the memory-failure implementation.
 
 - The 5 patch series "mm: reliable huge page allocator" from Johannes
   Weiner makes huge page allocations cheaper and more reliable by reducing
   fragmentation.
 
 - The 5 patch series "Minor memcg cleanups & prep for memdescs" from
   Matthew Wilcox is preparatory work for the future implementation of
   memdescs.
 
 - The 4 patch series "track memory used by balloon drivers" from Nico
   Pache introduces a way to track memory used by our various balloon
   drivers.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for active
   pages" from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.
 
 - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from
   Hao Jia separates the proactive reclaim statistics from the direct
   reclaim statistics.
 
 - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio"
   from Jinjiang Tu fixes our handling of hwpoisoned pages within the
   reclaim code.
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA
 jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF
 Ee93qSSLR1BkNdDw+931Pu0mXfbnBw==
 =Pn2K
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "Enable strict percpu address space checks" from Uros
   Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.

   This has caused a small amount of fallout - two or three issues were
   reported. In all cases the calling code was found to be incorrect.

 - The series "Some cleanup for memcg" from Chen Ridong implements some
   relatively monir cleanups for the memcontrol code.

 - The series "mm: fixes for device-exclusive entries (hmm)" from David
   Hildenbrand fixes a boatload of issues which David found then using
   device-exclusive PTE entries when THP is enabled. More work is
   needed, but this makes thins better - our own HMM selftests now
   succeed.

 - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
   remove the z3fold and zbud implementations. They have been deprecated
   for half a year and nobody has complained.

 - The series "mm: further simplify VMA merge operation" from Lorenzo
   Stoakes implements numerous simplifications in this area. No runtime
   effects are anticipated.

 - The series "mm/madvise: remove redundant mmap_lock operations from
   process_madvise()" from SeongJae Park rationalizes the locking in the
   madvise() implementation. Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.

 - The series "Tiny cleanup and improvements about SWAP code" from
   Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.

 - The series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak
   user-visible output.

 - The series "mm/damon/paddr: fix large folios access and schemes
   handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.

 - The series "mm/damon/core: fix wrong and/or useless damos_walk()
   behaviors" from SeongJae Park fixes a few issues with the accuracy of
   kdamond's walking of DAMON regions.

 - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
   Stoakes changes the interaction between framebuffer deferred-io and
   core MM. No functional changes are anticipated - this is preparatory
   work for the future removal of page structure fields.

 - The series "mm/damon: add support for hugepage_size DAMOS filter"
   from Usama Arif adds a DAMOS filter which permits the filtering by
   huge page sizes.

 - The series "mm: permit guard regions for file-backed/shmem mappings"
   from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state. The feature now covers shmem and
   file-backed mappings.

 - The series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping
   for pte-mapped large folios.

 - The series "reimplement per-vma lock as a refcount" from Suren
   Baghdasaryan puts the vm_lock back into the vma. Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy. This patchset provides small (0-10%) improvements on one
   microbenchmark.

 - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
   improves" from SeongJae Park does some maintenance work on the DAMON
   docs.

 - The series "hugetlb/CMA improvements for large systems" from Frank
   van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.

 - The series "mm/damon: introduce DAMOS filter type for unmapped pages"
   from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.

 - The series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.

 - The series "selftests/mm: Some cleanups from trying to run them" from
   Brendan Jackman fixes a pile of unrelated issues which Brendan
   encountered while runnimg our selftests.

 - The series "fs/proc/task_mmu: add guard region bit to pagemap" from
   Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.

 - The series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply
   wasn't being effective.

 - The series "mm: cleanups for device-exclusive entries (hmm)" from
   David Hildenbrand implements a number of unrelated cleanups in this
   code.

 - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
   implements a number of preparatoty cleanups to the GENERIC_PTDUMP
   Kconfig logic.

 - The series "mm/damon: auto-tune aggregation interval" from SeongJae
   Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.

 - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
   powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
   preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.

 - The series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the
   code easier to follow.

 - The series "page_counter cleanup and size reduction" from Shakeel
   Butt cleans up the page_counter code and fixes a size increase which
   we accidentally added late last year.

 - The series "Add a command line option that enables control of how
   many threads should be used to allocate huge pages" from Thomas
   Prescher does that. It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.

 - The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
   from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.

 - The series "mm/damon: make allow filters after reject filters useful
   and intuitive" from SeongJae Park improves the handling of allow and
   reject filters. Behaviour is made more consistent and the documention
   is updated accordingly.

 - The series "Switch zswap to object read/write APIs" from Yosry Ahmed
   updates zswap to the new object read/write APIs and thus permits the
   removal of some legacy code from zpool and zsmalloc.

 - The series "Some trivial cleanups for shmem" from Baolin Wang does as
   it claims.

 - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
   Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.

 - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
   preparatoty refactoring and cleanup of the mremap() code.

 - The series "mm: MM owner tracking for large folios (!hugetlb) +
   CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.

 - The series "mm/damon: add sysfs dirs for managing DAMOS filters based
   on handling layers" from SeongJae Park adds a couple of new sysfs
   directories to ease the management of DAMON/DAMOS filters.

 - The series "arch, mm: reduce code duplication in mem_init()" from
   Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.

 - The series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.

 - The series "mm: page_ext: Introduce new iteration API" from Luiz
   Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.

 - The series "Buddy allocator like (or non-uniform) folio split" from
   Zi Yan reworks the code to split a folio into smaller folios. The
   main benefit is lessened memory consumption: fewer post-split folios
   are generated.

 - The series "Minimize xa_node allocation during xarry split" from Zi
   Yan reduces the number of xarray xa_nodes which are generated during
   an xarray split.

 - The series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.

 - The series "Add tracepoints for lowmem reserves, watermarks and
   totalreserve_pages" from Martin Liu adds some more tracepoints to the
   page allocator code.

 - The series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which
   SeongJae observed during his earlier madvise work.

 - The series "mm/hwpoison: Fix regressions in memory failure handling"
   from Shuai Xue addresses two quite serious regressions which Shuai
   has observed in the memory-failure implementation.

 - The series "mm: reliable huge page allocator" from Johannes Weiner
   makes huge page allocations cheaper and more reliable by reducing
   fragmentation.

 - The series "Minor memcg cleanups & prep for memdescs" from Matthew
   Wilcox is preparatory work for the future implementation of memdescs.

 - The series "track memory used by balloon drivers" from Nico Pache
   introduces a way to track memory used by our various balloon drivers.

 - The series "mm/damon: introduce DAMOS filter type for active pages"
   from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.

 - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
   separates the proactive reclaim statistics from the direct reclaim
   statistics.

 - The series "mm/vmscan: don't try to reclaim hwpoison folio" from
   Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
   code.

* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
  mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
  x86/mm: restore early initialization of high_memory for 32-bits
  mm/vmscan: don't try to reclaim hwpoison folio
  mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
  cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
  mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
  selftests/mm: speed up split_huge_page_test
  selftests/mm: uffd-unit-tests support for hugepages > 2M
  docs/mm/damon/design: document active DAMOS filter type
  mm/damon: implement a new DAMOS filter type for active pages
  fs/dax: don't disassociate zero page entries
  MM documentation: add "Unaccepted" meminfo entry
  selftests/mm: add commentary about 9pfs bugs
  fork: use __vmalloc_node() for stack allocation
  docs/mm: Physical Memory: Populate the "Zones" section
  xen: balloon: update the NR_BALLOON_PAGES state
  hv_balloon: update the NR_BALLOON_PAGES state
  balloon_compaction: update the NR_BALLOON_PAGES state
  meminfo: add a per node counter for balloon drivers
  mm: remove references to folio in __memcg_kmem_uncharge_page()
  ...
2025-04-01 09:29:18 -07:00
Linus Torvalds b6dde1e527 NFSD 6.15 Release Notes
Neil Brown contributed more scalability improvements to NFSD's
 open file cache, and Jeff Layton contributed a menagerie of
 repairs to NFSD's NFSv4 callback / backchannel implementation.
 
 Mike Snitzer contributed a change to NFS re-export support that
 disables support for file locking on a re-exported NFSv4 mount.
 This is because NFSv4 state recovery is currently difficult if
 not impossible for re-exported NFS mounts. The change aims to
 prevent data integrity exposures after the re-export server
 crashes.
 
 Work continues on the evolving NFSD netlink administrative API.
 
 Many thanks to the contributors, reviewers, testers, and bug
 reporters who participated during the v6.15 development cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmfmpMIACgkQM2qzM29m
 f5f6DA/+P0YqoRg3Zk/4oWwXZWbfEOMhWFltT+D1PE2QjUfOZpiwUSFQfsfYgXO6
 OFu0iDQ4g8BxBeP6Umv61qy7Cv6n4fVzIHqzymXQvymh9JzoQiXlE9/fA8nAHuiH
 u7kkNPRi7faBz1sMg/WpN9CHctg7STPOhhG/JrZcSFZnh87mU1i4i4bZBNz8tVnK
 ZWf483OUuSmJY2/bUTkwvr4GbceTKBlLWFFjiRhfAKvJBWvu4myfC0DI5QzxmsgI
 MJ62do7AFJP1ww2Ih9LLi2kFIt/yyInSVAgyts1CPhlJ4BfPnTSOw/i2+CuF3D/M
 bZYEAOjH3AqjBZmq58sIQezpD5f9/TOrTSwYwS31zl/THYE413WiW80/MDoWqo0y
 9cSNkD3nJlPVLLCfF58vXLoe7wpLoN/ZbTdxoozzUWEFR5A4Jz3XP8F/Cws0cjem
 uWWAQMItiQpg1+RYJYfu4dg5+iN6dbgYbvzlr7buISwFNXi3Zo99MkJ4wHj9TJbL
 Tpjth1rWGPwwSOMT6ojKiYMq1oUzx5PuAm9Saq9oIzQAbBySmxHF/LSDz3wEuBoO
 MK1jzKroEmMk3fJOOAajSDLOdAbL3vfj6H/xi2IHvKnaz9yHCZNu2YGV05BBMprd
 hWePf69AO5Ky5Q9KuGClEtwvJ9ZR5pb4DO2dqaYu8ximu3O4vPo=
 =e2E2
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Neil Brown contributed more scalability improvements to NFSD's open
  file cache, and Jeff Layton contributed a menagerie of repairs to
  NFSD's NFSv4 callback / backchannel implementation.

  Mike Snitzer contributed a change to NFS re-export support that
  disables support for file locking on a re-exported NFSv4 mount. This
  is because NFSv4 state recovery is currently difficult if not
  impossible for re-exported NFS mounts. The change aims to prevent data
  integrity exposures after the re-export server crashes.

  Work continues on the evolving NFSD netlink administrative API.

  Many thanks to the contributors, reviewers, testers, and bug reporters
  who participated during the v6.15 development cycle"

* tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits)
  NFSD: Add a Kconfig setting to enable delegated timestamps
  sysctl: Fixes nsm_local_state bounds
  nfsd: use a long for the count in nfsd4_state_shrinker_count()
  nfsd: remove obsolete comment from nfs4_alloc_stid
  nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault()
  nfsd: reorganize struct nfs4_delegation for better packing
  nfsd: handle errors from rpc_call_async()
  nfsd: move cb_need_restart flag into cb_flags
  nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING
  nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY
  nfsd: prevent callback tasks running concurrently
  nfsd: disallow file locking and delegations for NFSv4 reexport
  nfsd: filecache: drop the list_lru lock during lock gc scans
  nfsd: filecache: don't repeatedly add/remove files on the lru list
  nfsd: filecache: introduce NFSD_FILE_RECENT
  nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc()
  nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync()
  NFSD: Re-organize nfsd_file_gc_worker()
  nfsd: filecache: remove race handling.
  fs: nfs: acl: Avoid -Wflex-array-member-not-at-end warning
  ...
2025-03-31 17:28:17 -07:00
Linus Torvalds 5c2a430e85 Ext4 bug fixes and cleanups, including:
* hardening against maliciously fuzzed file systems
   * backwards compatibility for the brief period when we attempted to
      ignore zero-width characters
   * avoid potentially BUG'ing if there is a file system corruption found
     during the file system unmount
   * fix free space reporting by statfs when project quotas are enabled
     and the free space is less than the remaining project quota
 
 Also improve performance when replaying a journal with a very large
 number of revoke records (applicable for Lustre volumes).
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmflfY4ACgkQ8vlZVpUN
 gaMx7Qf/akTELvyBZ7iPCCHh2HwayuO8qLhPNqrU0TmYMFvgwgYUPcQ3BLn8CE+/
 j5UeT8XxNaLU4GJn3z+q6yW6PnNHfqZqKry9j/iPc3s1mjTslntr/xENlgu6i4Bp
 Q58xc7Pj45vdmP+xmYhRnJcefgsZMvB/N1SEHxwIP8bntZqsEvP9pI82r9Ouc8SA
 ZLQ1/K4OADmk7f3GhlPr9AtgH7O0CjlAas30h/AW77DXBQl7ZgbDsGDlgTwaGqkR
 jHcvfr6hLnWy+MUVGmlNZ2HY6iUgBPItWlYCP/fsrUdnc+CONyl5E17JPSl1QQtR
 CLYlo4xV8j1+zJ094DjhDWMKI2G7jw==
 =oudL
 -----END PGP SIGNATURE-----

Merge tag 'ext4-for_linus-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Ext4 bug fixes and cleanups, including:

   - hardening against maliciously fuzzed file systems

   - backwards compatibility for the brief period when we attempted to
     ignore zero-width characters

   - avoid potentially BUG'ing if there is a file system corruption
     found during the file system unmount

   - fix free space reporting by statfs when project quotas are enabled
     and the free space is less than the remaining project quota

  Also improve performance when replaying a journal with a very large
  number of revoke records (applicable for Lustre volumes)"

* tag 'ext4-for_linus-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (71 commits)
  ext4: fix OOB read when checking dotdot dir
  ext4: on a remount, only log the ro or r/w state when it has changed
  ext4: correct the error handle in ext4_fallocate()
  ext4: Make sb update interval tunable
  ext4: avoid journaling sb update on error if journal is destroying
  ext4: define ext4_journal_destroy wrapper
  ext4: hash: simplify kzalloc(n * 1, ...) to kzalloc(n, ...)
  jbd2: add a missing data flush during file and fs synchronization
  ext4: don't over-report free space or inodes in statvfs
  ext4: clear DISCARD flag if device does not support discard
  jbd2: remove jbd2_journal_unfile_buffer()
  ext4: reorder capability check last
  ext4: update the comment about mb_optimize_scan
  jbd2: fix off-by-one while erasing journal
  ext4: remove references to bh->b_page
  ext4: goto right label 'out_mmap_sem' in ext4_setattr()
  ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all()
  ext4: introduce ITAIL helper
  jbd2: remove redundant function jbd2_journal_has_csum_v2or3_feature
  ext4: remove redundant function ext4_has_metadata_csum
  ...
2025-03-27 13:27:08 -07:00
Linus Torvalds 4a4b30ea80 bcachefs updates for 6.15
On disk format is now soft frozen: no more required/automatic are
 anticipated before taking off the experimental label.
 
 Major changes/features since 6.14:
 
 - Scrub
 
 - Blocksize greater than page size support
 
 - A number of "rebalance spinning and doing no work" issues have been
   fixed; we now check if the write allocation will succeed in
   bch2_data_update_init(), before kicking off the read.
 
   There's still more work to do in this area. Later we may want to add
   another bitset btree, like rebalance_work, to track "extents that
   rebalance was requested to move but couldn't", e.g. due to destination
   target having insufficient online devices.
 
 - We can now support scaling well into the petabyte range: latest
   bcachefs-tools will pick an appropriate bucket size at format time to
   ensure fsck can run in available memory (e.g. a server with 256GB of
   ram and 100PB of storage would want 16MB buckets).
 
 On disk format changes:
 
 - 1.21: cached backpointers (scalability improvement)
 
   Cached replicas now get backpointers, which means we no longer rely on
   incrementing bucket generation numbers to invalidate cached data: this
   lets us get rid of the bucket generation number garbage collection,
   which had to periodically rescan all extents to recompute bucket
   oldest_gen.
 
   Bucket generation numbers are now only used as a consistency check,
   but they're quite useful for that.
 
 - 1.22: stripe backpointers
 
   Stripes now have backpointers: erasure coded stripes have their own
   checksums, separate from the checksums for the extents they contain
   (and stripe checksums also cover the parity blocks). This is required
   for implementing scrub for stripes.
 
 - 1.23: stripe lru (scalability improvement)
 
   Persistent lru for stripes, ordered by "number of empty blocks". This
   is used by the stripe creation path, which depending on free space
   may create a new stripe out of a partially empty existing stripe
   instead of starting a brand new stripe.
 
   This replaces an in-memory heap, and means we no longer have to read
   in the stripes btree at startup.
 
 - 1.24: casefolding
 
   Case insensitive directory support, courtesy of Valve.
 
   This is an incompatible feature, to enable mount with
     -o version_upgrade=incompatible
 
 - 1.25: extent_flags
 
   Another incompatible feature requiring explicit opt-in to enable.
 
   This adds a flags entry to extents, and a flag bit that marks extents
   as poisoned.
 
   A poisoned extent is an extent that was unreadable due to checksum
   errors. We can't move such extents without giving them a new checksum,
   and we may have to move them (for e.g. copygc or device evacuate).
   We also don't want to delete them: in the future we'll have an API
   that lets userspace ignore checksum errors and attempt to deal with
   simple bitrot itself. Marking them as poisoned lets us continue to
   return the correct error to userspace on normal read calls.
 
 Other changes/features:
 
 - BCH_IOCTL_QUERY_COUNTERS: this is used by the new 'bcachefs fs top'
   command, which shows a live view of all internal filesystem counters.
 
 - Improved journal pipelining: we can now have 16 journal writes in
   flight concurrently, up from 4. We're logging significantly more to
   the journal than we used to with all the recent disk accounting
   changes and additions, so some users should see a performance
   increase on some workloads.
 
 - BCH_MEMBER_STATE_failed: previously, we would do no IO at all to
   devices marked as failed. Now we will attempt to read from them, but
   only if we have no better options.
 
 - New option, write_error_timeout: devices will be kicked out of the
   filesystem if all writes have been failing for x number of seconds.
 
   We now also kick devices out when notified by blk_holder_ops that
   they've gone offline.
 
 - Device option handling improvements: the discard option should now be
   working as expected (additionally, in -tools, all device options that
   can be set at format time can now be set at device add time, i.e.
   data_allowed, state).
 
 - We now try harder to read data after a checksum error: we'll do
   additional retries if necessary to a device after after it gave us
   data with a checksum error.
 
 - More self healing work: the full inode <-> dirent consistency checks
   that are currently run by fsck are now also run every time we do a
   lookup, meaning we'll be able to correct errors at runtime. Runtime
   self healing will be flipped on after the new changes have seen more
   testing, currently they're just checking for consistency.
 
 - KMSAN fixes: our KMSAN builds should be nearly clean now, which will
   put a massive dent in the syzbot dashboard.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmfhbnsACgkQE6szbY3K
 bnY6ew/9FXh3m71BvVpuqTYcUGzIC7gVrnkFy6n4W96v07OjSOoTNHOVVovajxc3
 P9LvA77BHC4Xro3H7ORpsIurOZUc6yx18ZizzulVbQFuYa7LY/kNri4ZBtGHcRiV
 pIdQDLSNmwFjPA4x2S1qTFSF1c586lad+UNQiLam5ophBwQPEO6vG51ZEHa4wld9
 +OhWTDYfrvij4D3Lt1ppvhuDP+PQBjhu/QFc0bGjHvKOjfV6sw9XU91sCYKOJIzd
 qzpsiQd5sepnX717Br3f5SLdxMq2lJYvRp9756vltOCaMBvJYJtHqtXCglHQEkFw
 yjhmPjk4r3VlKTF8K+wEJfAHwbC2kEn7csJNbt0+Nko5PPtFyrb8ok6QUbHCKscL
 L0VMnzaXHVqvG2VgYa31temfdz7HM/zHjQ8Al3eQPaqTHIoTXIBQxOQSea/apVMt
 TIlastvLoHfR8W7+LrwOmTjnBJGCJ+MrdcJzJDVk2tQmmcMA0boeZvl4aSklFuyB
 zNN5fxp0VMsxNyIHLJjQ3UcwVqHXC5w+f5H1ByQLUyQh+m/xaAaz7S+BTVdVbFPa
 1Z1xDuvuHOTnjIOamnOD1l36afJnhq5RciPCXCNtQSB819mc+AfNGQNQTVNOTReC
 iTiUCcNxu0/DIPlPmeJzAlukVJUgz+/knOI/6zPs3eI7/o88ZGg=
 =k3cV
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2025-03-24' of git://evilpiepirate.org/bcachefs

Pull bcachefs updates from Kent Overstreet:
 "On disk format is now soft frozen: no more required/automatic are
  anticipated before taking off the experimental label.

  Major changes/features since 6.14:

   - Scrub

   - Blocksize greater than page size support

   - A number of "rebalance spinning and doing no work" issues have been
     fixed; we now check if the write allocation will succeed in
     bch2_data_update_init(), before kicking off the read.

     There's still more work to do in this area. Later we may want to
     add another bitset btree, like rebalance_work, to track "extents
     that rebalance was requested to move but couldn't", e.g. due to
     destination target having insufficient online devices.

   - We can now support scaling well into the petabyte range: latest
     bcachefs-tools will pick an appropriate bucket size at format time
     to ensure fsck can run in available memory (e.g. a server with
     256GB of ram and 100PB of storage would want 16MB buckets).

  On disk format changes:

   - 1.21: cached backpointers (scalability improvement)

     Cached replicas now get backpointers, which means we no longer rely
     on incrementing bucket generation numbers to invalidate cached
     data: this lets us get rid of the bucket generation number garbage
     collection, which had to periodically rescan all extents to
     recompute bucket oldest_gen.

     Bucket generation numbers are now only used as a consistency check,
     but they're quite useful for that.

   - 1.22: stripe backpointers

     Stripes now have backpointers: erasure coded stripes have their own
     checksums, separate from the checksums for the extents they contain
     (and stripe checksums also cover the parity blocks). This is
     required for implementing scrub for stripes.

   - 1.23: stripe lru (scalability improvement)

     Persistent lru for stripes, ordered by "number of empty blocks".
     This is used by the stripe creation path, which depending on free
     space may create a new stripe out of a partially empty existing
     stripe instead of starting a brand new stripe.

     This replaces an in-memory heap, and means we no longer have to
     read in the stripes btree at startup.

   - 1.24: casefolding

     Case insensitive directory support, courtesy of Valve.

     This is an incompatible feature, to enable mount with
       -o version_upgrade=incompatible

   - 1.25: extent_flags

     Another incompatible feature requiring explicit opt-in to enable.

     This adds a flags entry to extents, and a flag bit that marks
     extents as poisoned.

     A poisoned extent is an extent that was unreadable due to checksum
     errors. We can't move such extents without giving them a new
     checksum, and we may have to move them (for e.g. copygc or device
     evacuate). We also don't want to delete them: in the future we'll
     have an API that lets userspace ignore checksum errors and attempt
     to deal with simple bitrot itself. Marking them as poisoned lets us
     continue to return the correct error to userspace on normal read
     calls.

  Other changes/features:

   - BCH_IOCTL_QUERY_COUNTERS: this is used by the new 'bcachefs fs top'
     command, which shows a live view of all internal filesystem
     counters.

   - Improved journal pipelining: we can now have 16 journal writes in
     flight concurrently, up from 4. We're logging significantly more to
     the journal than we used to with all the recent disk accounting
     changes and additions, so some users should see a performance
     increase on some workloads.

   - BCH_MEMBER_STATE_failed: previously, we would do no IO at all to
     devices marked as failed. Now we will attempt to read from them,
     but only if we have no better options.

   - New option, write_error_timeout: devices will be kicked out of the
     filesystem if all writes have been failing for x number of seconds.

     We now also kick devices out when notified by blk_holder_ops that
     they've gone offline.

   - Device option handling improvements: the discard option should now
     be working as expected (additionally, in -tools, all device options
     that can be set at format time can now be set at device add time,
     i.e. data_allowed, state).

   - We now try harder to read data after a checksum error: we'll do
     additional retries if necessary to a device after after it gave us
     data with a checksum error.

   - More self healing work: the full inode <-> dirent consistency
     checks that are currently run by fsck are now also run every time
     we do a lookup, meaning we'll be able to correct errors at runtime.
     Runtime self healing will be flipped on after the new changes have
     seen more testing, currently they're just checking for consistency.

   - KMSAN fixes: our KMSAN builds should be nearly clean now, which
     will put a massive dent in the syzbot dashboard"

* tag 'bcachefs-2025-03-24' of git://evilpiepirate.org/bcachefs: (180 commits)
  bcachefs: Kill unnecessary bch2_dev_usage_read()
  bcachefs: btree node write errors now print btree node
  bcachefs: Fix race in print_chain()
  bcachefs: btree_trans_restart_foreign_task()
  bcachefs: bch2_disk_accounting_mod2()
  bcachefs: zero init journal bios
  bcachefs: Eliminate padding in move_bucket_key
  bcachefs: Fix a KMSAN splat in btree_update_nodes_written()
  bcachefs: kmsan asserts
  bcachefs: Fix kmsan warnings in bch2_extent_crc_pack()
  bcachefs: Disable asm memcpys when kmsan enabled
  bcachefs: Handle backpointers with unknown data types
  bcachefs: Count BCH_DATA_parity backpointers correctly
  bcachefs: Run bch2_check_dirent_target() at lookup time
  bcachefs: Refactor bch2_check_dirent_target()
  bcachefs: Move bch2_check_dirent_target() to namei.c
  bcachefs: fs-common.c -> namei.c
  bcachefs: EIO cleanup
  bcachefs: bch2_write_prep_encoded_data() now returns errcode
  bcachefs: Simplify bch2_write_op_error()
  ...
2025-03-27 13:20:07 -07:00
Linus Torvalds 81d8e5e213 f2fs-for-6.15-rc1
In this round, there are three major updates: 1) folio conversion, 2) refactor
 for mount API conversion, 3) some performance improvement such as direct IO,
 checkpoint speed, and IO priority hints. For stability, there are patches
 which add more sanity checks and fixes some major issues like i_size in
 atomic write operations and write pointer recovery in zoned devices.
 
 Enhancement:
  - huge folio converion work by Matthew Wilcox
  - clean up for mount API conversion by Eric Sandeen
  - improve direct IO speed in the overwrite case
  - add some sanity check on node consistency
  - set highest IO priority for checkpoint thread
  - keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
  - add ioctl to get IO priority hint
  - add carve_out sysfs node for fsstat
 
 Bug fix:
  - disable nat_bits during umount to avoid potential nat entry corruption
  - fix missing i_size update on atomic writes
  - fix missing discard for active segments
  - fix running out of free segments
  - fix out-of-bounds access in f2fs_truncate_inode_blocks()
  - call f2fs_recover_quota_end() correctly
  - fix potential deadloop in prepare_compress_overwrite()
  - fix the missing write pointer correction for zoned device
  - fix to avoid panic once fallocation fails for pinfile
  - don't retry IO for corrupted data scenario
 
 There are many other clean up patches and minor bug fixes as usual.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmfhjuYACgkQQBSofoJI
 UNJF+hAAip0Sf6bXwt43KR3lmbXg/YBTtPDACOs375Pn8pfRVsAPIAf5e1AnlBET
 rA0KpqUrEZHXenQFF9n4LIoHYqfuUw3EMempuvp4qgQcx15Kaajw+EGWVLYstNy2
 9dELc9DA55f/i1uvHezGfQFy6hMfXasf+tkaYk0z0ZYDStYboLgkVAY+F869q1Dl
 D16Y7Lna12h4eCxDdssIPDsLjFH/2LDn7SmhXsnpZxwK0Zx8JSo83WxXHnzXHxz4
 vUkukInKgqcBDvf6ufW/YF3/tqSs20XEXNK3cI1vyHx1dwij6Us+G0n0WJHL30QE
 zsliecR0X28JP1rJ3ldCjr+4Kxm6/u/Uwpinm2Fm1jL67UY4TZcaARBE8k20I6ND
 j/L1+sXrIdZ1aILM9/bwCjgXiVFdZbvlfGfpTj1duAkQgRd+/s9cjPlo5j5HTad5
 XJmzJz6YOaUNarhP/E31Z9SV9M2kEcmoDOTxKBg6ZcMWXZ27B6Z0ag92prd0GWk6
 rWDuVj0eP/LDY1QedbHRPbg1D84jVgcnldfPaf9ptln993skJGdgS0dkegTqMR0L
 H8RgpWOzWZI53gnQdePdej8diHmD8uRTrLf/oABm//GzTHC5BdwVpArOl1DCuLFF
 YkMtVEicgnWRmL3PgODAdTJXaDi3uGvT116i+lfMlUn9p+t93r0=
 =E+TE
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, there are three major updates: (1) folio conversion,
  (2) refactoring for mount API conversion, (3) some performance
  improvement such as direct IO, checkpoint speed, and IO priority
  hints.

  For stability, there are patches which add more sanity checks and
  fixes some major issues like i_size in atomic write operations and
  write pointer recovery in zoned devices.

  Enhancements:
   - huge folio converion work by Matthew Wilcox
   - clean up for mount API conversion by Eric Sandeen
   - improve direct IO speed in the overwrite case
   - add some sanity check on node consistency
   - set highest IO priority for checkpoint thread
   - keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
   - add ioctl to get IO priority hint
   - add carve_out sysfs node for fsstat

  Bug fixes:
   - disable nat_bits during umount to avoid potential nat entry corruption
   - fix missing i_size update on atomic writes
   - fix missing discard for active segments
   - fix running out of free segments
   - fix out-of-bounds access in f2fs_truncate_inode_blocks()
   - call f2fs_recover_quota_end() correctly
   - fix potential deadloop in prepare_compress_overwrite()
   - fix the missing write pointer correction for zoned device
   - fix to avoid panic once fallocation fails for pinfile
   - don't retry IO for corrupted data scenario

  There are many other clean up patches and minor bug fixes as usual"

* tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: fix missing discard for active segments
  f2fs: optimize f2fs DIO overwrites
  f2fs: fix to avoid atomicity corruption of atomic file
  f2fs: pass sbi rather than sb to parse_options()
  f2fs: pass sbi rather than sb to quota qf_name helpers
  f2fs: defer readonly check vs norecovery
  f2fs: Pass sbi rather than sb to f2fs_set_test_dummy_encryption
  f2fs: make LAZYTIME a mount option flag
  f2fs: make INLINECRYPT a mount option flag
  f2fs: factor out an f2fs_default_check function
  f2fs: consolidate unsupported option handling errors
  f2fs: use f2fs_sb_has_device_alias during option parsing
  f2fs: add carve_out sysfs node
  f2fs: fix to avoid running out of free segments
  f2fs: Remove f2fs_write_node_page()
  f2fs: Remove f2fs_write_meta_page()
  f2fs: Remove f2fs_write_data_page()
  f2fs: Remove check for ->writepage
  Revert "f2fs: rebuild nat_bits during umount"
  f2fs: fix to avoid accessing uninitialized curseg
  ...
2025-03-27 12:55:54 -07:00
Linus Torvalds a86c6d0b2a fscrypt updates for 6.15
A fix for an issue where CONFIG_FS_ENCRYPTION could be enabled without
 some of its dependencies, and a small documentation update.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZ+CBShQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKxj0AQC4OlE/HbIY2w3zO8Az9WEF9+4Dz9od
 EpxTwO/fk9PC+gD/Z7r8J3xn+ykcy9QcZW+Qucd64k+rbmvqj36gXce6hAI=
 =r+lX
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt updates from Eric Biggers:
 "A fix for an issue where CONFIG_FS_ENCRYPTION could be enabled without
  some of its dependencies, and a small documentation update"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  fscrypt: mention init_on_free instead of page poisoning
  fscrypt: drop obsolete recommendation to enable optimized ChaCha20
  Revert "fscrypt: relax Kconfig dependencies for crypto API algorithms"
2025-03-25 18:31:38 -07:00
Linus Torvalds bdab2977e4 fsverity updates for 6.15
A fix for an issue where CONFIG_FS_VERITY could be enabled without some
 of its dependencies, and a small documentation update.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZ+CBZBQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK5WSAP9sihsazS5s2FwkY2ukwxuPBm448O6R
 w457RM/j2Dz6oQEAhQF4oKU6mWJk0dOFMoBJCEUUJinb1diYTE0BMtSbXAk=
 =BwSf
 -----END PGP SIGNATURE-----

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity updates from Eric Biggers:
 "A fix for an issue where CONFIG_FS_VERITY could be enabled without
  some of its dependencies, and a small documentation update"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  Revert "fsverity: relax build time dependency on CRYPTO_SHA256"
  Documentation: add a usecase for FS_IOC_READ_VERITY_METADATA
2025-03-25 18:30:23 -07:00
Linus Torvalds f81c2b8150 It has been a reasonably busy cycle for docs...
- Significant changes throughout the tree to bring Python code up to
   current standards and raise the minimum Python required to 3.9.  Much of
   this is preparatory to replacing the ancient Perl scripts/kernel-doc
   horror with a slightly less horrifying Python implementation, expected
   for 6.16.
 
 - Update the minimum Sphinx required to 3.4.3, allowing us to remove a
   bunch of older compatibility code.
 
 - Rework and improve the generation of the ABI documentation.
 
   (All of the above done by Mauro)
 
 - Lots of translation updates.  Alex Shi and Yanteng Si are taking on
   responsibility for the Chinese translations going forward; that work will
   still get to you via docs-next
 
 - Try to standardize the format for indicating a developer's affiliation in
   commit tags.
 
 - Clarify the TAB's role in CoC enforcement actions.
 
 - Try to spell out the rules for when a commit tag can name another
   developer without their explicit permission.
 
 Plus lots of other typo fixes and updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmfccxwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoYcH/jL/nS8YAiJ3awF5PH5tR3m5ddt9l+fKXWJx
 PB3KcHtDORbWltTA+Tvo2aP1jxGY9wqsIIvl+nvjJyUcfd72g4HNfTDUDXwP3OFU
 wTkaEAQp3n/hqnLXtJ2AzV3Ir5cIfEL2d7F6QsN1Gnof8iu2OuMk5iMeb0iexUX6
 FYjJq+jknh30VdAp2hxHy8q17R7h7PySh5OsjeAYJJroLv60n3DwQgnzHjXC/FT2
 Qq1UuEzlSpRoso2o2NwVTND6OVW081umo6YrioqD7ZC2G2fhRgLFJJtJGXDNcyUl
 gQv9xLSaTD97V4zaWPm28ObNBpY/GnAd4hMjB17wAH5xUfVS5Aw=
 =Gvdp
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.15' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a reasonably busy cycle for docs...

   - Significant changes throughout the tree to bring Python code up to
     current standards and raise the minimum Python required to 3.9

     Much of this is preparatory to replacing the ancient Perl
     scripts/kernel-doc horror with a slightly less horrifying Python
     implementation, expected for 6.16

   - Update the minimum Sphinx required to 3.4.3, allowing us to remove
     a bunch of older compatibility code

   - Rework and improve the generation of the ABI documentation

  (All of the above done by Mauro)

   - Lots of translation updates. Alex Shi and Yanteng Si are taking on
     responsibility for the Chinese translations going forward; that
     work will still get to you via docs-next

   - Try to standardize the format for indicating a developer's
     affiliation in commit tags

   - Clarify the TAB's role in CoC enforcement actions

   - Try to spell out the rules for when a commit tag can name another
     developer without their explicit permission

  Plus lots of other typo fixes and updates"

* tag 'docs-6.15' of git://git.lwn.net/linux: (98 commits)
  docs/zh_CN: fix spelling mistake
  docs/Chinese: change the disclaimer words
  docs/zh_CN: Add snp-tdx-threat-model index Chinese translation
  docs: driver-api: firmware: clarify userspace requirements
  docs: clarify rules wrt tagging other people
  docs: Remove outdated highuid.rst documentation
  Documentation: dma-buf: heaps: Add heap name definitions
  docs/.../submit-checklist: Use Documentation/admin-guide/abi.rst for cross-ref of README
  docs: Correct installation instruction
  Documentation: kcsan: fix "Plain Accesses and Data Races" URL in kcsan.rst
  Documentation/CoC: Spell out the TAB role in enforcement decisions
  Documentation: ocxl.rst: Update consortium site
  scripts: get_feat.pl: substitute s390x with s390
  scripts/kernel-doc: drop dead code for Wcontents_before_sections
  scripts/kernel-doc: don't add not needed new lines
  docs: driver-api/infiniband.rst: fix Kerneldoc markup
  drivers: firewire: firewire-cdev.h: fix identation on a kernel-doc markup
  drivers: media: intel-ipu3.h: fix identation on a kernel-doc markup
  include/asm-generic/io.h: fix kerneldoc markup
  Docs/arch/arm64: Fix spelling in amu.rst
  ...
2025-03-24 18:42:27 -07:00
Linus Torvalds aaca83f7b1 vfs-6.15-rc1.sysv
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rswAKCRCRxhvAZXjc
 ogkpAQD7/kR7QiOQTdDztDtAavZELOE3p4pUZfAcg8XFfH8QKQD/cgDpxgBbyvSQ
 VBInBrv1vPeSvlPWvFkyqT/n8eiR9AQ=
 =KQVR
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.sysv' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs sysv removal from Christian Brauner:
 "This removes the sysv filesystem. We've discussed this various times.

  It's time to try"

* tag 'vfs-6.15-rc1.sysv' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  sysv: Remove the filesystem
2025-03-24 11:35:53 -07:00
Linus Torvalds 26d8e43079 vfs-6.15-rc1.async.dir
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rNwAKCRCRxhvAZXjc
 onBJAP9Z8Ywmlb5KQ1E3HvDmkwyY6yOSyZ9/CmbzrkCJ8ywYkQD/d9/xt0EP/O/q
 N8YtzXArHWt7u0YbcVpy9WK3F72BdwU=
 =VJgY
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs async dir updates from Christian Brauner:
 "This contains cleanups that fell out of the work from async directory
  handling:

   - Change kern_path_locked() and user_path_locked_at() to never return
     a negative dentry. This simplifies the usability of these helpers
     in various places

   - Drop d_exact_alias() from the remaining place in NFS where it is
     still used. This also allows us to drop the d_exact_alias() helper
     completely

   - Drop an unnecessary call to fh_update() from nfsd_create_locked()

   - Change i_op->mkdir() to return a struct dentry

     Change vfs_mkdir() to return a dentry provided by the filesystems
     which is hashed and positive. This allows us to reduce the number
     of cases where the resulting dentry is not positive to very few
     cases. The code in these places becomes simpler and easier to
     understand.

   - Repack DENTRY_* and LOOKUP_* flags"

* tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  doc: fix inline emphasis warning
  VFS: Change vfs_mkdir() to return the dentry.
  nfs: change mkdir inode_operation to return alternate dentry if needed.
  fuse: return correct dentry for ->mkdir
  ceph: return the correct dentry on mkdir
  hostfs: store inode in dentry after mkdir if possible.
  Change inode_operations.mkdir to return struct dentry *
  nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
  nfs/vfs: discard d_exact_alias()
  VFS: add common error checks to lookup_one_qstr_excl()
  VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  VFS: repack LOOKUP_ bit flags.
  VFS: repack DENTRY_ flags.
2025-03-24 10:47:14 -07:00
Linus Torvalds 804382d59b vfs-6.15-rc1.overlayfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rUAAKCRCRxhvAZXjc
 opI3AP9ws4S/JXOjxNKoTYmNM2nZ8+r1v8tUxbLIiqdvzx9PygD/V1ZjXtn6lwZr
 OK8d5Y8UnlPZTlBF8D61op3AjnXYzws=
 =KV4p
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs overlayfs updates from Christian Brauner:
 "Currently overlayfs uses the mounter's credentials for its
  override_creds() calls. That provides a consistent permission model.

  This patches allows a caller to instruct overlayfs to use its
  credentials instead. The caller must be located in the same user
  namespace hierarchy as the user namespace the overlayfs instance will
  be mounted in. This provides a consistent and simple security model.

  With this it is possible to e.g., mount an overlayfs instance where
  the mounter must have CAP_SYS_ADMIN but the credentials used for
  override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage
  of custom fs{g,u}id different from the callers and other tweaks"

* tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/ovl: add third selftest for "override_creds"
  selftests/ovl: add second selftest for "override_creds"
  selftests/filesystems: add utils.{c,h}
  selftests/ovl: add first selftest for "override_creds"
  ovl: allow to specify override credentials
2025-03-24 10:37:40 -07:00
Linus Torvalds 0ec0d4ecdd vfs-6.15-rc1.iomap
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rGwAKCRCRxhvAZXjc
 okVnAP9VgaYjWGzaeep/dLzWtu7C/Cg5Swl1P84Vj+SJ+hFPEAD/auzWTV0D0Ko5
 5GLyUsLZehfeVDOSRqmiyt1po8iVsQo=
 =ANks
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs iomap updates from Christian Brauner:

 - Allow the filesystem to submit the writeback bios.

    - Allow the filsystem to track completions on a per-bio bases
      instead of the entire I/O.

    - Change writeback_ops so that ->submit_bio can be done by the
      filesystem.

    - A new ANON_WRITE flag for writes that don't have a block number
      assigned to them at the iomap level leaving the filesystem to do
      that work in the submission handler.

 - Incremental iterator advance

   The folio_batch support for zero range where the filesystem provides
   a batch of folios to process that might not be logically continguous
   requires more flexibility than the current offset based iteration
   currently offers.

   Update all iomap operations to advance the iterator within the
   operation and thus remove the need to advance from the core iomap
   iterator.

 - Make buffered writes work with RWF_DONTCACHE

   If RWF_DONTCACHE is set for a write, mark the folios being written as
   uncached. On writeback completion the pages will be dropped.

 - Introduce infrastructure for large atomic writes

   This will eventually be used by xfs and ext4.

* tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits)
  iomap: rework IOMAP atomic flags
  iomap: comment on atomic write checks in iomap_dio_bio_iter()
  iomap: inline iomap_dio_bio_opflags()
  iomap: fix inline data on buffered read
  iomap: Lift blocksize restriction on atomic writes
  iomap: Support SW-based atomic writes
  iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW
  xfs: flag as supporting FOP_DONTCACHE
  iomap: make buffered writes work with RWF_DONTCACHE
  iomap: introduce a full map advance helper
  iomap: rename iomap_iter processed field to status
  iomap: remove unnecessary advance from iomap_iter()
  dax: advance the iomap_iter on pte and pmd faults
  dax: advance the iomap_iter on dedupe range
  dax: advance the iomap_iter on unshare range
  dax: advance the iomap_iter on zero range
  dax: push advance down into dax_iomap_iter() for read and write
  dax: advance the iomap_iter in the read/write path
  iomap: convert misc simple ops to incremental advance
  iomap: advance the iter on direct I/O
  ...
2025-03-24 10:19:31 -07:00
Linus Torvalds 99c21beaab vfs-6.15-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90p4AAKCRCRxhvAZXjc
 ojMIAP9atkG3u7+490+NGWLdulQlaHnD51Owa9MiW87UfKpsTQEArwi/NrJqXJNT
 PFQ2xIa5TxG+9haChR89w3kjZ6b/hgs=
 =iDkx
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:

   - Add CONFIG_DEBUG_VFS infrastucture:
      - Catch invalid modes in open
      - Use the new debug macros in inode_set_cached_link()
      - Use debug-only asserts around fd allocation and install

   - Place f_ref to 3rd cache line in struct file to resolve false
     sharing

Cleanups:

   - Start using anon_inode_getfile_fmode() helper in various places

   - Don't take f_lock during SEEK_CUR if exclusion is guaranteed by
     f_pos_lock

   - Add unlikely() to kcmp()

   - Remove legacy ->remount_fs method from ecryptfs after port to the
     new mount api

   - Remove invalidate_inodes() in favour of evict_inodes()

   - Simplify ep_busy_loopER by removing unused argument

   - Avoid mmap sem relocks when coredumping with many missing pages

   - Inline getname()

   - Inline new_inode_pseudo() and de-staticize alloc_inode()

   - Dodge an atomic in putname if ref == 1

   - Consistently deref the files table with rcu_dereference_raw()

   - Dedup handling of struct filename init and refcounts bumps

   - Use wq_has_sleeper() in end_dir_add()

   - Drop the lock trip around I_NEW wake up in evict()

   - Load the ->i_sb pointer once in inode_sb_list_{add,del}

   - Predict not reaching the limit in alloc_empty_file()

   - Tidy up do_sys_openat2() with likely/unlikely

   - Call inode_sb_list_add() outside of inode hash lock

   - Sort out fd allocation vs dup2 race commentary

   - Turn page_offset() into a wrapper around folio_pos()

   - Remove locking in exportfs around ->get_parent() call

   - try_lookup_one_len() does not need any locks in autofs

   - Fix return type of several functions from long to int in open

   - Fix return type of several functions from long to int in ioctls

  Fixes:

   - Fix watch queue accounting mismatch"

* tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
  fs: sort out fd allocation vs dup2 race commentary, take 2
  fs: call inode_sb_list_add() outside of inode hash lock
  fs: tidy up do_sys_openat2() with likely/unlikely
  fs: predict not reaching the limit in alloc_empty_file()
  fs: load the ->i_sb pointer once in inode_sb_list_{add,del}
  fs: drop the lock trip around I_NEW wake up in evict()
  fs: use wq_has_sleeper() in end_dir_add()
  VFS/autofs: try_lookup_one_len() does not need any locks
  fs: dedup handling of struct filename init and refcounts bumps
  fs: consistently deref the files table with rcu_dereference_raw()
  exportfs: remove locking around ->get_parent() call.
  fs: use debug-only asserts around fd allocation and install
  fs: dodge an atomic in putname if ref == 1
  vfs: Remove invalidate_inodes()
  ecryptfs: remove NULL remount_fs from super_operations
  watch_queue: fix pipe accounting mismatch
  fs: place f_ref to 3rd cache line in struct file to resolve false sharing
  epoll: simplify ep_busy_loop by removing always 0 argument
  fs: Turn page_offset() into a wrapper around folio_pos()
  kcmp: improve performance adding an unlikely hint to task comparisons
  ...
2025-03-24 09:13:50 -07:00
Tuomas Ahola 34ceb69edd Documentation/fs/9p: fix broken link
In b529c06f9d (Update the documentation referencing Plan 9 from User
Space., 2020-04-26), another instance of the link was left unfixed.
Fix that as well.

Signed-off-by: Tuomas Ahola <taahol@utu.fi>
Message-ID: <20250322153639.4917-1-taahol@utu.fi>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2025-03-23 06:20:36 +09:00
Nico Pache 0bfd458685 MM documentation: add "Unaccepted" meminfo entry
Commit dcdfdd40fa ("mm: Add support for unaccepted memory") added a
entry to meminfo but did not document it in the proc.rst file.

This counter tracks the amount of "Unaccepted" guest memory for some
Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP.

Add the missing entry in the documentation.

Link: https://lkml.kernel.org/r/20250317230403.79632-1-npache@redhat.com
Signed-off-by: Nico Pache <npache@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:14 -07:00
Nico Pache 835de37603 meminfo: add a per node counter for balloon drivers
Patch series "track memory used by balloon drivers", v2.

This series introduces a way to track memory used by balloon drivers.

Add a NR_BALLOON_PAGES counter to track how many pages are reclaimed by
the balloon drivers.  First add the accounting, then updates the balloon
drivers (virtio, Hyper-V, VMware, Pseries-cmm, and Xen) to maintain this
counter.  The virtio, Vmware, and pseries-cmm balloon drivers utilize the
balloon_compaction interface to allocate and free balloon pages.  Other
balloon drivers will have to maintain this counter manually.

This makes the information visible in memory reporting interfaces like
/proc/meminfo, show_mem, and OOM reporting.

This provides admins visibility into their VM balloon sizes without
requiring different virtualization tooling.  Furthermore, this information
is helpful when debugging an OOM inside a VM.


This patch (of 4):

Add NR_BALLOON_PAGES counter to track memory used by balloon drivers and
expose it through /proc/meminfo and other memory reporting interfaces.

[npache@redhat.com: document Balloon Meminfo entry]
  Link: https://lkml.kernel.org/r/a0315ccf-f244-460e-8643-fd7388724fe5@redhat.com
Link: https://lkml.kernel.org/r/20250314213757.244258-1-npache@redhat.com
Link: https://lkml.kernel.org/r/20250314213757.244258-2-npache@redhat.com
Signed-off-by: Nico Pache <npache@redhat.com>
Cc: Alexander Atanasov <alexander.atanasov@virtuozzo.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Juegren Gross <jgross@suse.com>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:13 -07:00
John Garry 370a6de765
iomap: rework IOMAP atomic flags
Flag IOMAP_ATOMIC_SW is not really required. The idea of having this flag
is that the FS ->iomap_begin callback could check if this flag is set to
decide whether to do a SW (FS-based) atomic write. But the FS can set
which ->iomap_begin callback it wants when deciding to do a FS-based
atomic write.

Furthermore, it was thought that IOMAP_ATOMIC_HW is not a proper name, as
the block driver can use SW-methods to emulate an atomic write. So change
back to IOMAP_ATOMIC.

The ->iomap_begin callback needs though to indicate to iomap core that
REQ_ATOMIC needs to be set, so add IOMAP_F_ATOMIC_BIO for that.

These changes were suggested by Christoph Hellwig and Dave Chinner.

Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250320120250.4087011-4-john.g.garry@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-20 15:16:03 +01:00
David Hildenbrand 749492229e mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT)
Everything is in place to stop using the per-page mapcounts in large
folios: the mapcount of tail pages will always be logically 0 (-1 value),
just like it currently is for hugetlb folios already, and the page
mapcount of the head page is either 0 (-1 value) or contains a page type
(e.g., hugetlb).

Maintaining _nr_pages_mapped without per-page mapcounts is impossible, so
that one also has to go with CONFIG_NO_PAGE_MAPCOUNT.

There are two remaining implications:

(1) Per-node, per-cgroup and per-lruvec stats of "NR_ANON_MAPPED"
    ("mapped anonymous memory") and "NR_FILE_MAPPED"
    ("mapped file memory"):

    As soon as any page of the folio is mapped -- folio_mapped() -- we
    now account the complete folio as mapped. Once the last page is
    unmapped -- !folio_mapped() -- we account the complete folio as
    unmapped.

    This implies that ...

    * "AnonPages" and "Mapped" in /proc/meminfo and
      /sys/devices/system/node/*/meminfo
    * cgroup v2: "anon" and "file_mapped" in "memory.stat" and
      "memory.numa_stat"
    * cgroup v1: "rss" and "mapped_file" in "memory.stat" and
      "memory.numa_stat

    ... can now appear higher than before. But note that these folios do
    consume that memory, simply not all pages are actually currently
    mapped.

    It's worth nothing that other accounting in the kernel (esp. cgroup
    charging on allocation) is not affected by this change.

    [why oh why is "anon" called "rss" in cgroup v1]

 (2) Detecting partial mappings

     Detecting whether anon THPs are partially mapped gets a bit more
     unreliable. As long as a single MM maps such a large folio
     ("exclusively mapped"), we can reliably detect it. Especially before
     fork() / after a short-lived child process quit, we will detect
     partial mappings reliably, which is the common case.

     In essence, if the average per-page mapcount in an anon THP is < 1,
     we know for sure that we have a partial mapping.

     However, as soon as multiple MMs are involved, we might miss detecting
     partial mappings: this might be relevant with long-lived child
     processes. If we have a fully-mapped anon folio before fork(), once
     our child processes and our parent all unmap (zap/COW) the same pages
     (but not the complete folio), we might not detect the partial mapping.
     However, once the child processes quit we would detect the partial
     mapping.

     How relevant this case is in practice remains to be seen.
     Swapout/migration will likely mitigate this.

     In the future, RMAP walkers could check for that for that case
     (e.g., when collecting access bits during reclaim) and simply flag
     them for deferred-splitting.

Link: https://lkml.kernel.org/r/20250303163014.1128035-21-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirks^H^Hski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: tejun heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:06:48 -07:00
David Hildenbrand 6dd55dd1c5 fs/proc/task_mmu: remove per-page mapcount dependency for smaps/smaps_rollup (CONFIG_NO_PAGE_MAPCOUNT)
Let's implement an alternative when per-page mapcounts in large folios are
no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT.

When computing the output for smaps / smaps_rollups, in particular when
calculating the USS (Unique Set Size) and the PSS (Proportional Set Size),
we still rely on per-page mapcounts.

To determine private vs.  shared, we'll use folio_likely_mapped_shared(),
similar to how we handle PM_MMAP_EXCLUSIVE.  Similarly, we might now
under-estimate the USS and count pages towards "shared" that are actually
"private" ("exclusively mapped").

When calculating the PSS, we'll now also use the average per-page mapcount
for large folios: this can result in both, an over-estimation and an
under-estimation of the PSS.  The difference is not expected to matter
much in practice, but we'll have to learn as we go.

We can now provide folio_precise_page_mapcount() only with
CONFIG_PAGE_MAPCOUNT, and remove one of the last users of per-page
mapcounts when CONFIG_NO_PAGE_MAPCOUNT is enabled.

Document the new behavior.

Link: https://lkml.kernel.org/r/20250303163014.1128035-20-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirks^H^Hski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: tejun heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:06:47 -07:00
David Hildenbrand 7a34ae1449 fs/proc/task_mmu: remove per-page mapcount dependency for "mapmax" (CONFIG_NO_PAGE_MAPCOUNT)
Let's implement an alternative when per-page mapcounts in large folios are
no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT.

For calculating "mapmax", we now use the average per-page mapcount in a
large folio instead of the per-page mapcount.

For hugetlb folios and folios that are not partially mapped into MMs,
there is no change.

Likely, this change will not matter much in practice, and an alternative
might be to simple remove this stat with CONFIG_NO_PAGE_MAPCOUNT. 
However, there might be value to it, so let's keep it like that and
document the behavior.

Link: https://lkml.kernel.org/r/20250303163014.1128035-19-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirks^H^Hski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: tejun heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:06:47 -07:00
Dan Williams 653d7825c1 dcssblk: mark DAX broken, remove FS_DAX_LIMITED support
The dcssblk driver has long needed special case supoprt to enable limited
dax operation, so called CONFIG_FS_DAX_LIMITED.  This mode works around
the incomplete support for ZONE_DEVICE on s390 by forgoing the ability of
dax-mapped pages to support GUP.

Now, pending cleanups to fsdax that fix its reference counting [1] depend
on the ability of all dax drivers to supply ZONE_DEVICE pages.

To allow that work to move forward, dax support needs to be paused for
dcssblk until ZONE_DEVICE support arrives.  That work has been known for a
few years [2], and the removal of "pte_devmap" requirements [3] makes the
conversion easier.

For now, place the support behind CONFIG_BROKEN, and remove PFN_SPECIAL
(dcssblk was the only user).

Link: http://lore.kernel.org/cover.9f0e45d52f5cff58807831b6b867084d0b14b61c.1725941415.git-series.apopple@nvidia.com [1]
Link: http://lore.kernel.org/20210820210318.187742e8@thinkpad/ [2]
Link: http://lore.kernel.org/4511465a4f8429f45e2ac70d2e65dc5e1df1eb47.1725941415.git-series.apopple@nvidia.com [3]
Link: https://lkml.kernel.org/r/33eef2379c0d240f40cc15453fad2df1a4ae34c8.1740713401.git-series.apopple@nvidia.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Asahi Lina <lina@asahilina.net>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: linmiaohe <linmiaohe@huawei.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:06:40 -07:00
Andrii Nakryiko 87ad827a27 docs,procfs: document /proc/PID/* access permission checks
Add a paragraph explaining what sort of capabilities a process would need
to read procfs data for some other process.  Also mention that reading
data for its own process doesn't require any extra permissions.

Link: https://lkml.kernel.org/r/20250129001747.759990-1-andrii@kernel.org
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:30:47 -07:00
Bagas Sanjaya a42d685ff2 Documentation: bcachefs: SubmittingPatches: Convert footnotes to reST syntax
Footnotes list are outputted in htmldocs simply as long-running
paragraph instead. Use reST numbered footnotes syntax for the job.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya 76d6305dca Documentation: bcachefs: SubmittingPatches: Demote section headings
SubmttingPatches.rst has 4 section headings, all under the same heading
levels. In absence of title headings, these section headings are all
ended up as title headings in the docs output, which also affect
the index toctree (increasing titles to 6 from the original 2)
due to :numbered: option.

Demote second-to-last section headings, making "Submitting patches
to bcachefs" as title heading.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya 93422e0b33 Documentation: bcachefs: Split index toctree
bcachefs subsystem currently has 4 docs: two are development notes and
the rest are actual filesystem docs. These two groups are clearly
distinct and can be organized.

Split the toctree into two, one for each docs group. While at it, also
reduce :maxdepth: so that only title headings are listed in the
toctrees.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya 7442ef7082 Documentation: bcachefs: Add casefolding toctree entry
Sphinx reports htmldocs toctree warning:

Documentation/filesystems/bcachefs/casefolding.rst: WARNING: document isn't included in any toctree

Fix the warning by adding casefolding documentation entry to bcachefs
toctree.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221161728.32739f85@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya 47d4100b15 Documentation: bcachefs: casefolding: Use bullet list for dirent structure
The doc lists dirent structure for both regular and casefolded names,
yet it is written (and rendered) as long paragraph instead.

Write the structure list as bullet list.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya 210997859a Documentation: bcachefs: casefolding: Fix dentry/dcache considerations section
Sphinx reports htmldocs warnings on dentry/dcache section:

Documentation/filesystems/bcachefs/casefolding.rst:75: WARNING: Title underline too short.

dentry/dcache considerations
--------- [docutils]
Documentation/filesystems/bcachefs/casefolding.rst:84: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]

Fix the section by:

* Extending the section underline to match the section title length;
* Separating problem list from surrounding paragraphs.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221161911.2d16138b@canb.auug.org.au/
Closes: https://lore.kernel.org/linux-next/20250221162135.79be0147@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Bagas Sanjaya 82b5666912 Documentation: bcachefs: casefolding: Do not italicize NUL
Sphinx reports htmldocs warning:

Documentation/filesystems/bcachefs/casefolding.rst:36: WARNING: Inline interpreted text or phrase reference start-string without end-string. [docutils]

That's because NUL word is italicized but it is written in plural form
instead (`NUL`s). Sphinx, however, doesn't tip over when the italicized
word in this fashion is followed by punctuation instead.

Do not italicize the word to keep Sphinx happy.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221162135.79be0147@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Joshua Ashton d37c14ac6f bcachefs: bcachefs_metadata_version_casefolding
This patch implements support for case-insensitive file name lookups
in bcachefs.

The implementation uses the same UTF-8 lowering and normalization that
ext4 and f2fs is using.

More information is provided in Documentation/bcachefs/casefolding.rst

Compatibility notes:

This uses the new versioning scheme for incompatible features where an
incompatible feature is tied to a version number: the superblock says
"we may use incompat features up to x" and "incompat features up to x
are in use", disallowing mounting by previous versions.

Additionally, and old style incompat feature bit is used, so that
kernels without utf8 casefolding support know if casefolding
specifically is in use and they're allowed to mount.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:15 -04:00
Chao Yu 1788971e0b f2fs: introduce FAULT_INCONSISTENT_FOOTER
To simulate inconsistent node footer error.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 03:25:53 +00:00
Mike Snitzer 9254c8ae9b nfsd: disallow file locking and delegations for NFSv4 reexport
We do not and cannot support file locking with NFS reexport over
NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport
server reboot cannot allow clients to recover locks because the source
NFS server has not rebooted, and so it is not in grace.  Since the
source NFS server is not in grace, it cannot offer any guarantees that
the file won't have been changed between the locks getting lost and
any attempt to recover/reclaim them.  The same applies to delegations
and any associated locks, so disallow them too.

Clients are no longer allowed to get file locks or delegations from a
reexport server, any attempts will fail with operation not supported.

Update the "Reboot recovery" section accordingly in
Documentation/filesystems/nfs/reexport.rst

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:08 -04:00
Chao Yu c2ecba0265 f2fs: control nat_bits feature via mount option
Introduce a new mount option "nat_bits" to control nat_bits feature,
by default nat_bits feature is disabled.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-08 16:04:10 +00:00
Jan Kara 93fd0d46cb
vfs: Remove invalidate_inodes()
The function can be replaced by evict_inodes. The only difference is
that evict_inodes() skips the inodes with positive refcount without
touching ->i_lock, but they are equivalent as evict_inodes() repeats the
refcount check after having grabbed ->i_lock.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20250307144318.28120-2-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-08 12:19:22 +01:00
John Garry 794ca29dcc
iomap: Support SW-based atomic writes
Currently atomic write support requires dedicated HW support. This imposes
a restriction on the filesystem that disk blocks need to be aligned and
contiguously mapped to FS blocks to issue atomic writes.

XFS has no method to guarantee FS block alignment for regular,
non-RT files. As such, atomic writes are currently limited to 1x FS block
there.

To deal with the scenario that we are issuing an atomic write over
misaligned or discontiguous data blocks - and raise the atomic write size
limit - support a SW-based software emulated atomic write mode. For XFS,
this SW-based atomic writes would use CoW support to issue emulated untorn
writes.

It is the responsibility of the FS to detect discontiguous atomic writes
and switch to IOMAP_DIO_ATOMIC_SW mode and retry the write. Indeed,
SW-based atomic writes could be used always when the mounted bdev does
not support HW offload, but this strategy is not initially expected to be
used.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250303171120.2837067-6-john.g.garry@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06 11:00:12 +01:00
John Garry b4de0e9be9
iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW
In future xfs will support a SW-based atomic write, so rename
IOMAP_ATOMIC -> IOMAP_ATOMIC_HW to be clear which mode is being used.

Also relocate setting of IOMAP_ATOMIC_HW to the write path in
__iomap_dio_rw(), to be clear that this flag is only relevant to writes.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250303171120.2837067-3-john.g.garry@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06 11:00:12 +01:00
Christian Brauner 1743d385e7
Merge branch 'vfs-6.15.shared.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Bring in iomap changes that xfs relies on.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06 10:59:18 +01:00
Aiden Ma 50dc696c3a
doc: correcting two prefix errors in idmappings.rst
Add the 'k' prefix to id 21000. And id `u1000` in the third
idmapping should be mapped to `k31000`, not `u31000`.

Signed-off-by: Aiden Ma <jiaheng.ma@foxmail.com>
Link: https://lore.kernel.org/r/tencent_4E7B1F143E8051530C21FCADF4E014DCBB06@qq.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 11:54:18 +01:00
Christian Brauner be66901997
doc: fix inline emphasis warning
Fix a warning spotted by linux-next build (htmldocs):

Documentation/filesystems/porting.rst:1186: WARNING: Inline emphasis start-string without end-string. [docutils]

Introduced by commit

  88d5baf690 ("Change inode_operations.mkdir to return struct dentry *")

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 88d5baf690 ("Change inode_operations.mkdir to return struct dentry *")
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 11:52:50 +01:00
Eric Biggers 13dc8eb900 fscrypt: mention init_on_free instead of page poisoning
Page poisoning is an older debug option.  The modern way to initialize
memory on free for security reasons is to set init_on_free=1.

Link: https://lore.kernel.org/r/20250304210156.14912-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-03-04 13:02:45 -08:00
Eric Biggers eea957d8db fscrypt: drop obsolete recommendation to enable optimized ChaCha20
Since the crypto kconfig options are being fixed to enable optimized
ChaCha20 automatically
(https://lore.kernel.org/r/Z8AY16EIqAYpfmRI@gondor.apana.org.au/), it is
no longer necessary to give a recommendation to enable it.

Link: https://lore.kernel.org/r/20250304205501.13797-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-03-04 12:56:06 -08:00
NeilBrown 88d5baf690
Change inode_operations.mkdir to return struct dentry *
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
complete control of sequencing on the actual filesystem (e.g.  on a
different server) and may find that the inode created for a mkdir
request already exists in the icache and dcache by the time the mkdir
request returns.  For example, if the filesystem is mounted twice the
directory could be visible on the other mount before it is on the
original mount, and a pair of name_to_handle_at(), open_by_handle_at()
calls could instantiate the directory inode with an IS_ROOT() dentry
before the first mkdir returns.

This means that the dentry passed to ->mkdir() may not be the one that
is associated with the inode after the ->mkdir() completes.  Some
callers need to interact with the inode after the ->mkdir completes and
they currently need to perform a lookup in the (rare) case that the
dentry is no longer hashed.

This lookup-after-mkdir requires that the directory remains locked to
avoid races.  Planned future patches to lock the dentry rather than the
directory will mean that this lookup cannot be performed atomically with
the mkdir.

To remove this barrier, this patch changes ->mkdir to return the
resulting dentry if it is different from the one passed in.
Possible returns are:
  NULL - the directory was created and no other dentry was used
  ERR_PTR() - an error occurred
  non-NULL - this other dentry was spliced in

This patch only changes file-systems to return "ERR_PTR(err)" instead of
"err" or equivalent transformations.  Subsequent patches will make
further changes to some file-systems to return a correct dentry.

Not all filesystems reliably result in a positive hashed dentry:

- NFS, cifs, hostfs will sometimes need to perform a lookup of
  the name to get inode information.  Races could result in this
  returning something different. Note that this lookup is
  non-atomic which is what we are trying to avoid.  Placing the
  lookup in filesystem code means it only happens when the filesystem
  has no other option.
- kernfs and tracefs leave the dentry negative and the ->revalidate
  operation ensures that lookup will be called to correctly populate
  the dentry.  This could be fixed but I don't think it is important
  to any of the users of vfs_mkdir() which look at the dentry.

The recommendation to use
    d_drop();d_splice_alias()
is ugly but fits with current practice.  A planned future patch will
change this.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27 20:00:17 +01:00
Jens Axboe b2cd5ae693
iomap: make buffered writes work with RWF_DONTCACHE
Add iomap buffered write support for RWF_DONTCACHE. If RWF_DONTCACHE is
set for a write, mark the folios being written as uncached. Then
writeback completion will drop the pages. The write_iter handler simply
kicks off writeback for the pages, and writeback completion will take
care of the rest.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20250204184047.356762-2-axboe@kernel.dk
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27 11:27:54 +01:00
Christian Brauner 71628584df
Merge patch series "prep patches for my mkdir series"
NeilBrown <neilb@suse.de> says:

These two patches are cleanup are dependencies for my mkdir changes and
subsequence directory locking changes.

* patches from https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de: (2 commits)
  nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
  nfs/vfs: discard d_exact_alias()

Link: https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27 09:25:34 +01:00
Jan Kara 448fa70158
sysv: Remove the filesystem
Since 2002 (change "Replace BKL for chain locking with sysvfs-private
rwlock") the sysv filesystem was doing IO under a rwlock in its
get_block() function (yes, a non-sleepable lock hold over a function
used to read inode metadata for all reads and writes).  Nobody noticed
until syzbot in 2023 [1]. This shows nobody is using the filesystem.
Just drop it.

[1] https://lore.kernel.org/all/0000000000000ccf9a05ee84f5b0@google.com/

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20250220163940.10155-2-jack@suse.cz
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-21 10:32:47 +01:00
Christian Brauner 539a0879de
ovl: allow to specify override credentials
Currently overlayfs uses the mounter's credentials for it's
override_creds() calls. That provides a consistent permission model.

This patches allows a caller to instruct overlayfs to use its
credentials instead. The caller must be located in the same user
namespace hierarchy as the user namespace the overlayfs instance will be
mounted in. This provides a consistent and simple security model.

With this it is possible to e.g., mount an overlayfs instance where the
mounter must have CAP_SYS_ADMIN but the credentials used for
override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage of
custom fs{g,u}id different from the callers and other tweaks.

Link: https://lore.kernel.org/r/20250219-work-overlayfs-v3-1-46af55e4ceda@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19 14:32:09 +01:00
NeilBrown 204a575e91
VFS: add common error checks to lookup_one_qstr_excl()
Callers of lookup_one_qstr_excl() often check if the result is negative or
positive.
These changes can easily be moved into lookup_one_qstr_excl() by checking the
lookup flags:
LOOKUP_CREATE means it is NOT an error if the name doesn't exist.
LOOKUP_EXCL means it IS an error if the name DOES exist.

This patch adds these checks, then removes error checks from callers,
and ensures that appropriate flags are passed.

This subtly changes the meaning of LOOKUP_EXCL.  Previously it could
only accompany LOOKUP_CREATE.  Now it can accompany LOOKUP_RENAME_TARGET
as well.  A couple of small changes are needed to accommodate this.  The
NFS change is functionally a no-op but ensures nfs_is_exclusive_create() does
exactly what the name says.

Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250217003020.3170652-3-neilb@suse.de
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19 14:09:15 +01:00
NeilBrown 1c3cb50b58
VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
No callers of kern_path_locked() or user_path_locked_at() want a
negative dentry.  So change them to return -ENOENT instead.  This
simplifies callers.

This results in a subtle change to bcachefs in that an ioctl will now
return -ENOENT in preference to -EXDEV.  I believe this restores the
behaviour to what it was prior to
 Commit bbe6a7c899 ("bch2_ioctl_subvolume_destroy(): fix locking")

Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250217003020.3170652-2-neilb@suse.de
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19 14:08:41 +01:00
Allison Karlitskaya 212df80e01 Documentation: add a usecase for FS_IOC_READ_VERITY_METADATA
Mention another potential usecase for FS_IOC_READ_VERITY_METADATA:
creating filesystem images which contain fs-verity-enabled files,
without having to redo all of the work in userspace.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Link: https://lore.kernel.org/r/20241126084833.70538-1-allison.karlitskaya@redhat.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-17 11:03:29 -08:00
Charles Han 07ab93f3cc Documentation: Remove repeated word in docs
Remove the repeated word "to" docs.

Signed-off-by: Charles Han <hanchunchao@inspur.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250207073433.23604-1-hanchunchao@inspur.com
2025-02-10 10:54:50 -07:00
Ritvik Gupta 7038f9f2e8 documentation/filesystems: fix spelling mistakes
Corrected the following spelling mistakes,
based on the suggestions by codespell:

1. Optionaly   -> Optionally
2. prefereable -> preferable
3. peformance  -> performance
4. ontext      -> context
5. failuer     -> failure
6. poiners     -> pointers
7. realtively  -> relatively
8. uptream     -> upstream

Signed-off-by: Ritvik Gupta <ritvikfoss@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250210043937.30952-1-ritvikfoss@gmail.com
2025-02-10 10:42:28 -07:00
Kemeng Shi 06b9e91425 jbd2: remove unused transaction->t_private_list
After we remove ext4 journal callback, transaction->t_private_list is
not used anymore. Just remove unused transaction->t_private_list.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20241218145414.1422946-3-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-02-10 07:48:24 -05:00
Kent Overstreet fdfd0ad828 bcachefs docs: SubmittingPatches.rst
Add an (initial?) patch submission checklist, focusing mainly on
testing.

Yes, all patches must be tested, and that starts (but does not end) with
the patch author.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06 22:35:11 -05:00
Christoph Hellwig 034c29fb3e
iomap: add a IOMAP_F_ANON_WRITE flag
Add a IOMAP_F_ANON_WRITE flag that indicates that the write I/O does not
have a target block assigned to it yet at iomap time and the file system
will do that in the bio submission handler, splitting the I/O as needed.

This is used to implement Zone Append based I/O for zoned XFS, where
splitting writes to the hardware limits and assigning a zone to them
happens just before sending the I/O off to the block layer, but could
also be useful for other things like compressed I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250206064035.2323428-4-hch@lst.de
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06 13:02:13 +01:00
Christoph Hellwig c50105933f
iomap: allow the file system to submit the writeback bios
Change ->prepare_ioend to ->submit_ioend and require file systems that
implement it to submit the bio.  This is needed for file systems that
do their own work on the bios before submitting them to the block layer
like btrfs or zoned xfs.  To make this easier also pass the writeback
context to the method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250206064035.2323428-2-hch@lst.de
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06 13:02:13 +01:00
Linus Torvalds d3d90cc289 Provide stable parent and name to ->d_revalidate() instances
Most of the filesystem methods where we care about dentry name
 and parent have their stability guaranteed by the callers;
 ->d_revalidate() is the major exception.
 
 It's easy enough for callers to supply stable values for
 expected name and expected parent of the dentry being
 validated.  That kills quite a bit of boilerplate in
 ->d_revalidate() instances, along with a bunch of races
 where they used to access ->d_name without sufficient
 precautions.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZ5gkoQAKCRBZ7Krx/gZQ
 6w9FAP4nyxNNWMjE1TwuWR/DNDMYYuw/qn/miZ88B5BUM8hzqgD/W2SjRvcbSaIm
 xSIYpbtKgtqNU34P1PU+dBvL8Utz2AE=
 =TWY8
 -----END PGP SIGNATURE-----

Merge tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs d_revalidate updates from Al Viro:
 "Provide stable parent and name to ->d_revalidate() instances

  Most of the filesystem methods where we care about dentry name and
  parent have their stability guaranteed by the callers;
  ->d_revalidate() is the major exception.

  It's easy enough for callers to supply stable values for expected name
  and expected parent of the dentry being validated. That kills quite a
  bit of boilerplate in ->d_revalidate() instances, along with a bunch
  of races where they used to access ->d_name without sufficient
  precautions"

* tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  9p: fix ->rename_sem exclusion
  orangefs_d_revalidate(): use stable parent inode and name passed by caller
  ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
  nfs: fix ->d_revalidate() UAF on ->d_name accesses
  nfs{,4}_lookup_validate(): use stable parent inode passed by caller
  gfs2_drevalidate(): use stable parent inode and name passed by caller
  fuse_dentry_revalidate(): use stable parent inode and name passed by caller
  vfat_revalidate{,_ci}(): use stable parent inode passed by caller
  exfat_d_revalidate(): use stable parent inode passed by caller
  fscrypt_d_revalidate(): use stable parent inode passed by caller
  ceph_d_revalidate(): propagate stable name down into request encoding
  ceph_d_revalidate(): use stable parent inode passed by caller
  afs_d_revalidate(): use stable name and parent inode passed by caller
  Pass parent directory inode and expected name to ->d_revalidate()
  generic_ci_d_compare(): use shortname_storage
  ext4 fast_commit: make use of name_snapshot primitives
  dissolve external_name.u into separate members
  make take_dentry_name_snapshot() lockless
  dcache: back inline names with a struct-wrapped array of unsigned long
  make sure that DNAME_INLINE_LEN is a multiple of word size
2025-01-30 09:13:35 -08:00
Linus Torvalds 92cc9acff7 fuse update for 6.14
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ5nzXQAKCRDh3BK/laaZ
 PCaJAP4gw6CnxrdzuPvm7yEsINuHdavQ8aeCiimWwOC4eBzkOgD/SlMry5vwCkW9
 WOzoONVUcNIPEqYXThw77OFlkFpKGwQ=
 =JQE1
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:
 "Add support for io-uring communication between kernel and userspace
  using IORING_OP_URING_CMD (Bernd Schubert). Following features enable
  gains in performance compared to the regular interface:

   - Allow processing multiple requests with less syscall overhead

   - Combine commit of old and fetch of new fuse request

   - CPU/NUMA affinity of queues

  Patches were reviewed by several people, including Pavel Begunkov,
  io-uring co-maintainer"

* tag 'fuse-update-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: prevent disabling io-uring on active connections
  fuse: enable fuse-over-io-uring
  fuse: block request allocation until io-uring init is complete
  fuse: {io-uring} Prevent mount point hang on fuse-server termination
  fuse: Allow to queue bg requests through io-uring
  fuse: Allow to queue fg requests through io-uring
  fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static
  fuse: {io-uring} Handle teardown of ring entries
  fuse: Add io-uring sqe commit and fetch support
  fuse: {io-uring} Make hash-list req unique finding functions non-static
  fuse: Add fuse-io-uring handling into fuse_copy
  fuse: Make fuse_copy non static
  fuse: {io-uring} Handle SQEs - register commands
  fuse: make args->in_args[0] to be always the header
  fuse: Add fuse-io-uring design documentation
  fuse: Move request bits
  fuse: Move fuse_get_dev to header file
  fuse: rename to fuse_dev_end_requests and make non-static
2025-01-29 09:40:23 -08:00
Linus Torvalds b88fe2b5dd NFS Client Updates for Linux 6.14
New Features:
   * Enable using direct IO with localio
   * Added localio related tracepoints
 
 Bugfixes:
   * Sunrpc fixes for working with a very large cl_tasks list
   * Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
   * Fixes for handling reconnections with localio
   * Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
   * Fix COPY_NOTIFY xdr_buf size calculations
   * pNFS/Flexfiles fix for retrying requesting a layout segment for reads
   * Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is expired
 
 Cleanups:
   * Various other nfs & nfsd localio cleanups
   * Prepratory patches for async copy improvements that are under development
   * Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other xprts
   * Add netns inum and srcaddr to debugfs rpc_xprt info
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmeZUzUACgkQ18tUv7Cl
 QOvArw/9HltIlcJHbi7tApGJ4dFpuJCa/fHbA1n5bHvKrCR5aElmFZoiFDdsM1JX
 kFAlMED9n1dW9VmzLJcepxmrLo/t7KXueiZNharHynWTxcszSl6jS+tOFBW6OflG
 Rrrjq/SrsWI2Fu8X4e/7ZV7pqRLGGn5SSMwgbuMbcyzBvVgN8mZM/BneIp1J59AI
 5NOsif5KWetVhQc43zlRlbVWR5cvNGcUK4i58LIaPFzPMt0xq/XJI+QWffj6kv4g
 cHabCNYTdQYMkhiPQC+LLYkw6sMbw2NatajTTYNMWfR/I+7wz9k5ej6CHKPIFCSr
 xjmscypySTLfMFQjrDFZkpX2CwSp/VIbV6go36DJwAlcCRzqz+I7cajlrRK4zvyr
 DyrcaZHvClEczP9QqdPj2wqRXbmIOsDMksOu4ACTUImd4o3f2v1K6DcwRj9oUIhV
 AGR31OEMt2A+RaVvVZYR4PpixJ01vH9LcmsaOu5KkHX8X4q2osQ7eMy+FV4kV09S
 pMnxDMAyszJU8IuzUG1/HfkonNlDMivIbqpgG4ZaVW08Nq4mCxJll1vTAa9FTLz2
 z+9eocqKwf724q1RAgOB7vj4AwOwL4Ul6d18UBtyUitZz3ndLRZ8Yy6r/AhrpCsC
 3co0Y3znZbKeRjmReNl0GLG4qiKE+E7Xh23Lf3IqXg8GE2Mu+Ls=
 =srvH
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Enable using direct IO with localio
   - Added localio related tracepoints

  Bugfixes:
   - Sunrpc fixes for working with a very large cl_tasks list
   - Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
   - Fixes for handling reconnections with localio
   - Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
   - Fix COPY_NOTIFY xdr_buf size calculations
   - pNFS/Flexfiles fix for retrying requesting a layout segment for
     reads
   - Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is
     expired

  Cleanups:
   - Various other nfs & nfsd localio cleanups
   - Prepratory patches for async copy improvements that are under
     development
   - Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other
     xprts
   - Add netns inum and srcaddr to debugfs rpc_xprt info"

* tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits)
  SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired
  sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info
  pnfs/flexfiles: retry getting layout segment for reads
  NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE
  NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE
  NFSv4.2: fix COPY_NOTIFY xdr buf size calculation
  NFS: Rename struct nfs4_offloadcancel_data
  NFS: Fix typo in OFFLOAD_CANCEL comment
  NFS: CB_OFFLOAD can return NFS4ERR_DELAY
  nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it
  nfs: fix incorrect error handling in LOCALIO
  nfs: probe for LOCALIO when v3 client reconnects to server
  nfs: probe for LOCALIO when v4 client reconnects to server
  nfs/localio: remove redundant code and simplify LOCALIO enablement
  nfs_common: add nfs_localio trace events
  nfs_common: track all open nfsd_files per LOCALIO nfs_client
  nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock
  nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file
  nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
  nfsd: update percpu_ref to manage references on nfsd_net
  ...
2025-01-28 14:23:46 -08:00
Linus Torvalds 2ab002c755 Driver core and debugfs updates
Here is the big set of driver core and debugfs updates for 6.14-rc1.
 It's coming late in the merge cycle as there are a number of merge
 conflicts with your tree now, and I wanted to make sure they were
 working properly.  To resolve them, look in linux-next, and I will send
 the "fixup" patch as a response to the pull request.
 
 Included in here is a bunch of driver core, PCI, OF, and platform rust
 bindings (all acked by the different subsystem maintainers), hence the
 merge conflict with the rust tree, and some driver core api updates to
 mark things as const, which will also require some fixups due to new
 stuff coming in through other trees in this merge window.
 
 There are also a bunch of debugfs updates from Al, and there is at least
 one user that does have a regression with these, but Al is working on
 tracking down the fix for it.  In my use (and everyone else's linux-next
 use), it does not seem like a big issue at the moment.
 
 Here's a short list of the things in here:
   - driver core bindings for PCI, platform, OF, and some i/o functions.
     We are almost at the "write a real driver in rust" stage now,
     depending on what you want to do.
   - misc device rust bindings and a sample driver to show how to use
     them
   - debugfs cleanups in the fs as well as the users of the fs api for
     places where drivers got it wrong or were unnecessarily doing things
     in complex ways.
   - driver core const work, making more of the api take const * for
     different parameters to make the rust bindings easier overall.
   - other small fixes and updates
 
 All of these have been in linux-next with all of the aforementioned
 merge conflicts, and the one debugfs issue, which looks to be resolved
 "soon".
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ5koPA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymFHACfT5acDKf2Bov2Lc/5u3vBW/R6ChsAnj+LmgVI
 hcDSPodj4szR40RRnzBd
 =u5Ey
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core and debugfs updates from Greg KH:
 "Here is the big set of driver core and debugfs updates for 6.14-rc1.

  Included in here is a bunch of driver core, PCI, OF, and platform rust
  bindings (all acked by the different subsystem maintainers), hence the
  merge conflict with the rust tree, and some driver core api updates to
  mark things as const, which will also require some fixups due to new
  stuff coming in through other trees in this merge window.

  There are also a bunch of debugfs updates from Al, and there is at
  least one user that does have a regression with these, but Al is
  working on tracking down the fix for it. In my use (and everyone
  else's linux-next use), it does not seem like a big issue at the
  moment.

  Here's a short list of the things in here:

   - driver core rust bindings for PCI, platform, OF, and some i/o
     functions.

     We are almost at the "write a real driver in rust" stage now,
     depending on what you want to do.

   - misc device rust bindings and a sample driver to show how to use
     them

   - debugfs cleanups in the fs as well as the users of the fs api for
     places where drivers got it wrong or were unnecessarily doing
     things in complex ways.

   - driver core const work, making more of the api take const * for
     different parameters to make the rust bindings easier overall.

   - other small fixes and updates

  All of these have been in linux-next with all of the aforementioned
  merge conflicts, and the one debugfs issue, which looks to be resolved
  "soon""

* tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits)
  rust: device: Use as_char_ptr() to avoid explicit cast
  rust: device: Replace CString with CStr in property_present()
  devcoredump: Constify 'struct bin_attribute'
  devcoredump: Define 'struct bin_attribute' through macro
  rust: device: Add property_present()
  saner replacement for debugfs_rename()
  orangefs-debugfs: don't mess with ->d_name
  octeontx2: don't mess with ->d_parent or ->d_parent->d_name
  arm_scmi: don't mess with ->d_parent->d_name
  slub: don't mess with ->d_name
  sof-client-ipc-flood-test: don't mess with ->d_name
  qat: don't mess with ->d_name
  xhci: don't mess with ->d_iname
  mtu3: don't mess wiht ->d_iname
  greybus/camera - stop messing with ->d_iname
  mediatek: stop messing with ->d_iname
  netdevsim: don't embed file_operations into your structs
  b43legacy: make use of debugfs_get_aux()
  b43: stop embedding struct file_operations into their objects
  carl9170: stop embedding file_operations into their objects
  ...
2025-01-28 12:25:12 -08:00
Al Viro 30d61efe11 9p: fix ->rename_sem exclusion
9p wants to be able to build a path from given dentry to fs root and keep
it valid over a blocking operation.

->s_vfs_rename_mutex would be a natural candidate, but there are places
where we need that and where we have no way to tell if ->s_vfs_rename_mutex
is already held deeper in callchain.  Moreover, it's only held for
cross-directory renames; name changes within the same directory happen
without it.

Solution:
	* have d_move() done in ->rename() rather than in its caller
	* maintain a 9p-private rwsem (per-filesystem)
	* hold it exclusive over the relevant part of ->rename()
	* hold it shared over the places where we want the path.

That almost works.  FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control.  However, there's
also __d_unalias(), which isn't covered by any of that.

If ->lookup() hits a directory inode with preexisting dentry elsewhere
(due to e.g. rename done on server behind our back), d_splice_alias()
called by ->lookup() will move/rename that alias.

Add a couple of optional methods, so that __d_unalias() would do
	if alias->d_op->d_unalias_trylock != NULL
		if (!alias->d_op->d_unalias_trylock(alias))
			fail (resulting in -ESTALE from lookup)
	__d_move(...)
	if alias->d_op->d_unalias_unlock != NULL
		alias->d_unalias_unlock(alias)
where it currently does __d_move().  9p instances do down_write_trylock()
and up_write() of ->rename_mutex.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro 5be1fa8abd Pass parent directory inode and expected name to ->d_revalidate()
->d_revalidate() often needs to access dentry parent and name; that has
to be done carefully, since the locking environment varies from caller
to caller.  We are not guaranteed that dentry in question will not be
moved right under us - not unless the filesystem is such that nothing
on it ever gets renamed.

It can be dealt with, but that results in boilerplate code that isn't
even needed - the callers normally have just found the dentry via dcache
lookup and want to verify that it's in the right place; they already
have the values of ->d_parent and ->d_name stable.  There is a couple
of exceptions (overlayfs and, to less extent, ecryptfs), but for the
majority of calls that song and dance is not needed at all.

It's easier to make ecryptfs and overlayfs find and pass those values if
there's a ->d_revalidate() instance to be called, rather than doing that
in the instances.

This commit only changes the calling conventions; making use of supplied
values is left to followups.

NOTE: some instances need more than just the parent - things like CIFS
may need to build an entire path from filesystem root, so they need
more precautions than the usual boilerplate.  This series doesn't
do anything to that need - these filesystems have to keep their locking
mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
a-la v9fs).

One thing to keep in mind when using name is that name->name will normally
point into the pathname being resolved; the filename in question occupies
name->len bytes starting at name->name, and there is NUL somewhere after it,
but it the next byte might very well be '/' rather than '\0'.  Do not
ignore name->len.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:23 -05:00
Linus Torvalds 9c5968db9e The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
 
 - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the
   page allocator so we end up with the ability to allocate and free
   zero-refcount pages.  So that callers (ie, slab) can avoid a refcount
   inc & dec.
 
 - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use
   large folios other than PMD-sized ones.
 
 - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and
   fixes for this small built-in kernel selftest.
 
 - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of
   the mapletree code.
 
 - "mm: fix format issues and param types" from Keren Sun implements a
   few minor code cleanups.
 
 - "simplify split calculation" from Wei Yang provides a few fixes and a
   test for the mapletree code.
 
 - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes
   continues the work of moving vma-related code into the (relatively) new
   mm/vma.c.
 
 - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
   Hildenbrand cleans up and rationalizes handling of gfp flags in the page
   allocator.
 
 - "readahead: Reintroduce fix for improper RA window sizing" from Jan
   Kara is a second attempt at fixing a readahead window sizing issue.  It
   should reduce the amount of unnecessary reading.
 
 - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
   addresses an issue where "huge" amounts of pte pagetables are
   accumulated
   (https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/).
   Qi's series addresses this windup by synchronously freeing PTE memory
   within the context of madvise(MADV_DONTNEED).
 
 - "selftest/mm: Remove warnings found by adding compiler flags" from
   Muhammad Usama Anjum fixes some build warnings in the selftests code
   when optional compiler warnings are enabled.
 
 - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David
   Hildenbrand tightens the allocator's observance of __GFP_HARDWALL.
 
 - "pkeys kselftests improvements" from Kevin Brodsky implements various
   fixes and cleanups in the MM selftests code, mainly pertaining to the
   pkeys tests.
 
 - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
   estimate application working set size.
 
 - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
   provides some cleanups to memcg's hugetlb charging logic.
 
 - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
   removes the global swap cgroup lock.  A speedup of 10% for a tmpfs-based
   kernel build was demonstrated.
 
 - "zram: split page type read/write handling" from Sergey Senozhatsky
   has several fixes and cleaups for zram in the area of zram_write_page().
   A watchdog softlockup warning was eliminated.
 
 - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky
   cleans up the pagetable destructor implementations.  A rare
   use-after-free race is fixed.
 
 - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
   simplifies and cleans up the debugging code in the VMA merging logic.
 
 - "Account page tables at all levels" from Kevin Brodsky cleans up and
   regularizes the pagetable ctor/dtor handling.  This results in
   improvements in accounting accuracy.
 
 - "mm/damon: replace most damon_callback usages in sysfs with new core
   functions" from SeongJae Park cleans up and generalizes DAMON's sysfs
   file interface logic.
 
 - "mm/damon: enable page level properties based monitoring" from
   SeongJae Park increases the amount of information which is presented in
   response to DAMOS actions.
 
 - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes
   DAMON's long-deprecated debugfs interfaces.  Thus the migration to sysfs
   is completed.
 
 - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter
   Xu cleans up and generalizes the hugetlb reservation accounting.
 
 - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
   removes a never-used feature of the alloc_pages_bulk() interface.
 
 - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
   extends DAMOS filters to support not only exclusion (rejecting), but
   also inclusion (allowing) behavior.
 
 - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
   "introduces a new memory descriptor for zswap.zpool that currently
   overlaps with struct page for now.  This is part of the effort to reduce
   the size of struct page and to enable dynamic allocation of memory
   descriptors."
 
 - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and
   simplifies the swap allocator locking.  A speedup of 400% was
   demonstrated for one workload.  As was a 35% reduction for kernel build
   time with swap-on-zram.
 
 - "mm: update mips to use do_mmap(), make mmap_region() internal" from
   Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
   mmap_region() can be made MM-internal.
 
 - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU
   regressions and otherwise improves MGLRU performance.
 
 - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park
   updates DAMON documentation.
 
 - "Cleanup for memfd_create()" from Isaac Manjarres does that thing.
 
 - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand
   provides various cleanups in the areas of hugetlb folios, THP folios and
   migration.
 
 - "Uncached buffered IO" from Jens Axboe implements the new
   RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache
   reading and writing.  To permite userspace to address issues with
   massive buildup of useless pagecache when reading/writing fast devices.
 
 - "selftests/mm: virtual_address_range: Reduce memory" from Thomas
   Weißschuh fixes and optimizes some of the MM selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5a+cwAKCRDdBJ7gKXxA
 jtoyAP9R58oaOKPJuTizEKKXvh/RpMyD6sYcz/uPpnf+cKTZxQEAqfVznfWlw/Lz
 uC3KRZYhmd5YrxU4o+qjbzp9XWX/xAE=
 =Ib2s
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "The various patchsets are summarized below. Plus of course many
  indivudual patches which are described in their changelogs.

   - "Allocate and free frozen pages" from Matthew Wilcox reorganizes
     the page allocator so we end up with the ability to allocate and
     free zero-refcount pages. So that callers (ie, slab) can avoid a
     refcount inc & dec

   - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
     use large folios other than PMD-sized ones

   - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
     and fixes for this small built-in kernel selftest

   - "mas_anode_descend() related cleanup" from Wei Yang tidies up part
     of the mapletree code

   - "mm: fix format issues and param types" from Keren Sun implements a
     few minor code cleanups

   - "simplify split calculation" from Wei Yang provides a few fixes and
     a test for the mapletree code

   - "mm/vma: make more mmap logic userland testable" from Lorenzo
     Stoakes continues the work of moving vma-related code into the
     (relatively) new mm/vma.c

   - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
     Hildenbrand cleans up and rationalizes handling of gfp flags in the
     page allocator

   - "readahead: Reintroduce fix for improper RA window sizing" from Jan
     Kara is a second attempt at fixing a readahead window sizing issue.
     It should reduce the amount of unnecessary reading

   - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
     addresses an issue where "huge" amounts of pte pagetables are
     accumulated:

       https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/

     Qi's series addresses this windup by synchronously freeing PTE
     memory within the context of madvise(MADV_DONTNEED)

   - "selftest/mm: Remove warnings found by adding compiler flags" from
     Muhammad Usama Anjum fixes some build warnings in the selftests
     code when optional compiler warnings are enabled

   - "mm: don't use __GFP_HARDWALL when migrating remote pages" from
     David Hildenbrand tightens the allocator's observance of
     __GFP_HARDWALL

   - "pkeys kselftests improvements" from Kevin Brodsky implements
     various fixes and cleanups in the MM selftests code, mainly
     pertaining to the pkeys tests

   - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
     estimate application working set size

   - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
     provides some cleanups to memcg's hugetlb charging logic

   - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
     removes the global swap cgroup lock. A speedup of 10% for a
     tmpfs-based kernel build was demonstrated

   - "zram: split page type read/write handling" from Sergey Senozhatsky
     has several fixes and cleaups for zram in the area of
     zram_write_page(). A watchdog softlockup warning was eliminated

   - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
     Brodsky cleans up the pagetable destructor implementations. A rare
     use-after-free race is fixed

   - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
     simplifies and cleans up the debugging code in the VMA merging
     logic

   - "Account page tables at all levels" from Kevin Brodsky cleans up
     and regularizes the pagetable ctor/dtor handling. This results in
     improvements in accounting accuracy

   - "mm/damon: replace most damon_callback usages in sysfs with new
     core functions" from SeongJae Park cleans up and generalizes
     DAMON's sysfs file interface logic

   - "mm/damon: enable page level properties based monitoring" from
     SeongJae Park increases the amount of information which is
     presented in response to DAMOS actions

   - "mm/damon: remove DAMON debugfs interface" from SeongJae Park
     removes DAMON's long-deprecated debugfs interfaces. Thus the
     migration to sysfs is completed

   - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
     Peter Xu cleans up and generalizes the hugetlb reservation
     accounting

   - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
     removes a never-used feature of the alloc_pages_bulk() interface

   - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
     extends DAMOS filters to support not only exclusion (rejecting),
     but also inclusion (allowing) behavior

   - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
     introduces a new memory descriptor for zswap.zpool that currently
     overlaps with struct page for now. This is part of the effort to
     reduce the size of struct page and to enable dynamic allocation of
     memory descriptors

   - "mm, swap: rework of swap allocator locks" from Kairui Song redoes
     and simplifies the swap allocator locking. A speedup of 400% was
     demonstrated for one workload. As was a 35% reduction for kernel
     build time with swap-on-zram

   - "mm: update mips to use do_mmap(), make mmap_region() internal"
     from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
     mmap_region() can be made MM-internal

   - "mm/mglru: performance optimizations" from Yu Zhao fixes a few
     MGLRU regressions and otherwise improves MGLRU performance

   - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
     Park updates DAMON documentation

   - "Cleanup for memfd_create()" from Isaac Manjarres does that thing

   - "mm: hugetlb+THP folio and migration cleanups" from David
     Hildenbrand provides various cleanups in the areas of hugetlb
     folios, THP folios and migration

   - "Uncached buffered IO" from Jens Axboe implements the new
     RWF_DONTCACHE flag which provides synchronous dropbehind for
     pagecache reading and writing. To permite userspace to address
     issues with massive buildup of useless pagecache when
     reading/writing fast devices

   - "selftests/mm: virtual_address_range: Reduce memory" from Thomas
     Weißschuh fixes and optimizes some of the MM selftests"

* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
  mm/compaction: fix UBSAN shift-out-of-bounds warning
  s390/mm: add missing ctor/dtor on page table upgrade
  kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
  tools: add VM_WARN_ON_VMG definition
  mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
  seqlock: add missing parameter documentation for raw_seqcount_try_begin()
  mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
  mm/page_alloc: remove the incorrect and misleading comment
  zram: remove zcomp_stream_put() from write_incompressible_page()
  mm: separate move/undo parts from migrate_pages_batch()
  mm/kfence: use str_write_read() helper in get_access_type()
  selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
  kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
  selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
  selftests/mm: vm_util: split up /proc/self/smaps parsing
  selftests/mm: virtual_address_range: unmap chunks after validation
  selftests/mm: virtual_address_range: mmap() without PROT_WRITE
  selftests/memfd/memfd_test: fix possible NULL pointer dereference
  mm: add FGP_DONTCACHE folio creation flag
  mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
  ...
2025-01-26 18:36:23 -08:00
Linus Torvalds c159dfbdd4 Mainly individually changelogged singleton patches. The patch series in
this pull are:
 
 - "lib min_heap: Improve min_heap safety, testing, and documentation"
   from Kuan-Wei Chiu provides various tightenings to the min_heap library
   code.
 
 - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some
   cleanup and Rust preparation in the xarray library code.
 
 - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes
   pathnames in some code comments.
 
 - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the
   new secs_to_jiffies() in various places where that is appropriate.
 
 - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
   switches two filesystems to the new mount API.
 
 - "Convert ocfs2 to use folios" from Matthew Wilcox does that.
 
 - "Remove get_task_comm() and print task comm directly" from Yafang Shao
   removes now-unneeded calls to get_task_comm() in various places.
 
 - "squashfs: reduce memory usage and update docs" from Phillip Lougher
   implements some memory savings in squashfs and performs some
   maintainability work.
 
 - "lib: clarify comparison function requirements" from Kuan-Wei Chiu
   tightens the sort code's behaviour and adds some maintenance work.
 
 - "nilfs2: protect busy buffer heads from being force-cleared" from
   Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a
   corrupted image.
 
 - "nilfs2: fix kernel-doc comments for function return values" from
   Ryusuke Konishi fixes some nilfs kerneldoc.
 
 - "nilfs2: fix issues with rename operations" from Ryusuke Konishi
   addresses some nilfs BUG_ONs which syzbot was able to trigger.
 
 - "minmax.h: Cleanups and minor optimisations" from David Laight
   does some maintenance work on the min/max library code.
 
 - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work
   on the xarray library code.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5SP5QAKCRDdBJ7gKXxA
 jqN7AQChvwXGG43n4d5SDiA/rH7ddvowQcDqhC9cAMJ1ReR7qwEA8/LIWDE4PdMX
 mJnaZ1/ibpEpearrChCViApQtcyEGQI=
 =ti4E
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Mainly individually changelogged singleton patches. The patch series
  in this pull are:

   - "lib min_heap: Improve min_heap safety, testing, and documentation"
     from Kuan-Wei Chiu provides various tightenings to the min_heap
     library code

   - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms
     some cleanup and Rust preparation in the xarray library code

   - "Update reference to include/asm-<arch>" from Geert Uytterhoeven
     fixes pathnames in some code comments

   - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses
     the new secs_to_jiffies() in various places where that is
     appropriate

   - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
     switches two filesystems to the new mount API

   - "Convert ocfs2 to use folios" from Matthew Wilcox does that

   - "Remove get_task_comm() and print task comm directly" from Yafang
     Shao removes now-unneeded calls to get_task_comm() in various
     places

   - "squashfs: reduce memory usage and update docs" from Phillip
     Lougher implements some memory savings in squashfs and performs
     some maintainability work

   - "lib: clarify comparison function requirements" from Kuan-Wei Chiu
     tightens the sort code's behaviour and adds some maintenance work

   - "nilfs2: protect busy buffer heads from being force-cleared" from
     Ryusuke Konishi fixes an issues in nlifs when the fs is presented
     with a corrupted image

   - "nilfs2: fix kernel-doc comments for function return values" from
     Ryusuke Konishi fixes some nilfs kerneldoc

   - "nilfs2: fix issues with rename operations" from Ryusuke Konishi
     addresses some nilfs BUG_ONs which syzbot was able to trigger

   - "minmax.h: Cleanups and minor optimisations" from David Laight does
     some maintenance work on the min/max library code

   - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance
     work on the xarray library code"

* tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits)
  ocfs2: use str_yes_no() and str_no_yes() helper functions
  include/linux/lz4.h: add some missing macros
  Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent
  Xarray: remove repeat check in xas_squash_marks()
  Xarray: distinguish large entries correctly in xas_split_alloc()
  Xarray: move forward index correctly in xas_pause()
  Xarray: do not return sibling entries from xas_find_marked()
  ipc/util.c: complete the kernel-doc function descriptions
  gcov: clang: use correct function param names
  latencytop: use correct kernel-doc format for func params
  minmax.h: remove some #defines that are only expanded once
  minmax.h: simplify the variants of clamp()
  minmax.h: move all the clamp() definitions after the min/max() ones
  minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
  minmax.h: reduce the #define expansion of min(), max() and clamp()
  minmax.h: update some comments
  minmax.h: add whitespace around operators and after commas
  nilfs2: do not update mtime of renamed directory that is not moved
  nilfs2: handle errors that nilfs_prepare_chunk() may return
  CREDITS: fix spelling mistake
  ...
2025-01-26 17:50:53 -08:00
Andrew Morton 91fe0e4d04 Documentation/filesystems/proc.rst: fix possessive form of "process"
The possessive form of "process" is "process's".  Fix up various
misdirected attempts at this.  Also reflow some paragraphs.

Cc: David Hildenbrand <david@redhat.com>
Cc: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:40 -08:00
xu xin 3ab76c767b ksm: add ksm involvement information for each process
In /proc/<pid>/ksm_stat, add two extra ksm involvement items including
KSM_mergeable and KSM_merge_any.  It helps administrators to better know
the system's KSM behavior at process level.

ksm_merge_any: yes/no
	whether the process'mm is added by prctl() into the candidate list
	of KSM or not, and fully enabled at process level.

ksm_mergeable: yes/no
    whether any VMAs of the process'mm are currently applicable to KSM.

Purpose
=======
These two items are just to improve the observability of KSM at process
level, so that users can know if a certain process has enabled KSM.

For example, if without these two items, when we look at
/proc/<pid>/ksm_stat and there's no merging pages found, We are not sure
whether it is because KSM was not enabled or because KSM did not
successfully merge any pages.

Although "mg" in /proc/<pid>/smaps indicate VM_MERGEABLE, it's opaque
and not very obvious for non professionals.

[akpm@linux-foundation.org: wording tweaks, per David and akpm]
Link: https://lkml.kernel.org/r/20250110174034304QOb8eDoqtFkp3_t8mqnqc@zte.com.cn
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Cc: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:40 -08:00
Phillip Lougher f2cad9850e Documentation: update the Squashfs filesystem documentation
This patch updates the following which are out of date.

- Zstd has been added to the compression algorithms supported.
- The filesystem mailing list (for the kernel code) is changed to
  linux-fsdevel rather than the now very little used Sourceforge
  mailing list.
- The Squashfs website has been changed to the Squashfs-tools github
  repository.
- The fact that Squashfs-tools is likely packaged by the linux
  distribution is mentioned.

Link: https://lkml.kernel.org/r/20241229233752.54481-4-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24 22:47:21 -08:00
Bernd Schubert a7040a06e4 fuse: Add fuse-io-uring design documentation
[Add several documentation updates I had missed after
renaming functions and also fixes 'make htmldocs'.]

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-01-24 11:53:56 +01:00
Linus Torvalds d0f93ac2c3 Documentation changes this time around include:
- Quite a bit of Chinese and Spanish translation work.
 
 - Clarifying that Git commit IDs >12chars are OK
 
 - A new nvme-multipath document
 
 - A reorganization of the admin-guide top-level page to make it readable
 
 - Clarification of the role of Acked-by and maintainer discretion on their
   acceptance.
 
 - Some reorganization of debugging-oriented docs.
 
 ...and typo fixes, documentation updates, etc. as usual.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmeOp8EPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YipUH/iffvlVYuqoVdPUFWdmsiNjwOCRE2MIfp8qO
 tPTRRHJAny+NlFT0IWlGUbLNoNXtvpN47YlkaeAjdrsjASerfpwzje7t4Z1B+jWT
 0YwGBCvDIGasfRCx7D14+w5aqkEEynfsy+QurwcuDxcHMQGwt7ZCuTNOVO6BULEr
 L++BMwqapUr5IemP4ItQqDVVF9sp6bWEhaOnTTJCLU6oG23uUSSA/59sJmwDJUk7
 6J3VGO1An4Jte9WX7qkVrSBNO5cOOhaFiFXIeNxfOioOPctBwxKiHDJnzVud8ipz
 R+tnUI/8hEvyJ7GZFezyZxmMnFs0P2DEYAkaN+hBs/nUjx0dKUg=
 =YxaS
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.14' of git://git.lwn.net/linux

Pull Documentation updates from Jonathan Corbet:

 - Quite a bit of Chinese and Spanish translation work

 - Clarifying that Git commit IDs >12chars are OK

 - A new nvme-multipath document

 - A reorganization of the admin-guide top-level page to make it
   readable

 - Clarification of the role of Acked-by and maintainer discretion on
   their acceptance

 - Some reorganization of debugging-oriented docs

... and typo fixes, documentation updates, etc as usual

* tag 'docs-6.14' of git://git.lwn.net/linux: (50 commits)
  Documentation: Fix x86_64 UEFI outdated references to elilo
  Documentation/sysctl: Add timer_migration to kernel.rst
  docs/mm: Physical memory: Remove zone_t
  docs: submitting-patches: clarify that signers may use their discretion on tags
  docs: submitting-patches: clarify difference between Acked-by and Reviewed-by
  docs: submitting-patches: clarify Acked-by and introduce "# Suffix"
  Documentation: bug-hunting.rst: remove odd contact information
  docs/zh_CN: Add sak index Chinese translation
  doc: module: DEFAULT_SYMBOL_NAMESPACE must be defined before #includes
  doc: module: Fix documented type of namespace
  Documentation/kernel-parameters: Fix a reference to vga-softcursor.rst
  docs/zh_CN: Add landlock index Chinese translation
  Documentation: Fix typo localmodonfig -> localmodconfig
  overlayfs.rst: Fix and improve grammar
  docs/zh_CN: Add siphash index Chinese translation
  docs/zh_CN: Add security IMA-templates Chinese translation
  docs/zh_CN: Add security digsig Chinese translation
  Align git commit ID abbreviation guidelines and checks
  docs: process: submitting-patches: split canonical patch format section
  docs/zh_CN: Add security lsm Chinese translation
  ...
2025-01-21 18:00:00 -08:00
Linus Torvalds 2622f29041 bcachefs updates for 6.14-rc1
Lots of scalability work, another big on disk format change. On disk
 format version goes from 1.13 to 1.20.
 
 Like 6.11, this is another big and expensive automatic/required on disk
 format upgrade. This is planned to be the last big on disk format
 upgrade before the experimental label comes off. There will be one more
 minor on disk format update for a few things that couldn't make this
 release.
 
 Headline improvements:
 - Fix mount time regression that some users encountered post the 6.11
   disk accounting rewrite.
 
   Accounting keys were encoded little endian (typetag in the low bits) -
   which didn't anticipate adding accounting keys for every inode, which
   aren't stored in memory and we don't want to scan at mount time.
 
 - fsck time on large filesystems is improved by multiple orders of
   magnitude. Previously, 100TB was about the practical max filesystem
   size, where users were reporting fsck times of a day+. With the new
   changes (which nearly eliminate backpointers fsck overhead), we fsck'd
   a filesystem with 10PB of data in 1.5 hours.
 
   The problematic fsck passes were walking every extent and checking for
   missing backpointers, and walking every backpointer to check for
   dangling backpointers. As we've been adding more and more runtime self
   healing there was no reason to keep around the backpointers -> extents
   pass; dangling backpointers are just deleted, and we can do that when
   using them - thus, backpointers -> extents is now only run in debug
   mode.
 
   extents -> backpointers does need to exist, since missing backpointers
   would mean we can't find data to move it (for e.g. copygc, device
   evacuate, scrub). But the new on disk format version makes possible a
   new strategy where we sum up backpointers within a bucket and check it
   against the bucket sector counts, and then only scan for missing
   backpointers if the counts are off (and then, only for specific
   buckets).
 
 Full list of on disk format changes:
 - 1.14: backpointer_bucket_gen
   Backpointers now have a field for the bucket generation number,
   replacing the obsolete bucket_offset field. This is needed for the
   new "sum up backpointers within a bucket" code, since backpointers use
   the btree write buffer - meaning we will see stale reads, and this
   runs online, with the filesystem in full rw mode.
 
 - 1.15: disk_accounting_big_endian
   As previously described, fix the endianness of accounting keys so that
   accounting keys with the same typetag sort together, and accounting
   read can skip types it's not interested in.
 
 - 1.16: reflink_p_may_update_opts:
   This version indicates that a new reflink pointer field is understood
   and may be used; the field indicates whether the reflink pointer has
   permissions to update IO path options (e.g. compression, replicas) may
   be updated on the indirect extent it points to.
 
   This completes the rebalance/reflink data path option handling from
   the 6.13 pull request.
 
 - 1.17: inode_depth
   Add a new inode field, bi_depth, to accelerate the
   check_directory_structure fsck path, which checks for loops in the
   filesystem heirarchy.
 
   check_inodes and check_dirents check connectivity, so
   check_directory_structure only has to check for loops - by walking
   back up to the root from every directory.
 
   But a path can't be a loop if it has a counter that increases
   monotonically from root to leaf - adding a depth counter means that we
   can check for loops with only local (parent -> child) checks. We might
   need to occasionally renumber the depth field in fsck if directories
   have been moved around, but then future fsck runs will be much faster.
 
 - 1.18: persistent_inode_cursors
 
   Previously, the cursor used for inode allocation was only kept in
   memory, which meant that users with large filesystems and lots of
   files were reporting that the first create after mounting would take
   awhile - since it had to scan from the start.
 
   Inode allocation cursors are now persistent, and also include a
   generation field (incremented on wraparound, which will only happen if
   inode allocation is restricted to 32 bit inodes), so that we don't
   have to leave inode_generation keys around after a delete.
 
   The option for 32 bit inode numbers may now also be set on individual
   directories, and non-32 bit inode allocations are disallowed from
   allocating from the 32 bit part of the inode number space.
 
 - 1.19: autofix_errors
 
   Runtime self healing is now the default.o
 
 - 1.20: directory size (from Hongbo)
 
   directory i_size is now meaningful, and not 0.
 
 Release notes from the previous 6.13 pull request:
 
 - Self healing work:
   Allocator and reflink now run the exact same check/repair code that
   fsck does at runtime, where applicable.
 
   The long term goal here is to remove inconsistent() errors (that cause
   us to go emergency read only) by lifting fsck code up to normal
   runtime paths; we should only go emergency read-only if we detect an
   inconsistency that was due to a runtime bug - or truly catastrophic
   damage (corrupted btree roots/interior nodes).
 
 - Reflink repair no longer deletes reflink pointers: instead we flip an
   error bit and log the error, and they can still be deleted by file
   deletion. This means a temporary failure to find an indirect extent
   (perhaps repaired later by btree node scan) won't result in
   unnecessary data loss
 
 - Improvements to rebalance data path option handling: we can now
   correctly apply changed filesystem-level io path options to pending
   rebalance work, and soon we'll be able to apply file-level io path
   option changes to indirect extents.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmeOiboACgkQE6szbY3K
 bnY8zQ//Yoy+5ZA07tQV+Fi0JV0DZ6w3xotxNhAUeaKgCKHgp37gcKa47TFir4pd
 6ha7PQV3GimFwHoIUfOY5X4Y+bEm16XblyfK3VU6IgGiE3cUg+1q8b8WrD2eHmLJ
 qIT8DWWpAM2AcZ/f5G37hH8pxn2t0TUuzJ1Sz7wEhJUNZEP+z+qaacnGhwuc8yQ3
 Srj7Cc/NSd9T+6G2yKhERFITUrXmqVGgGihhVZqs0hCAPt8bwn5K8d1H2IKoj1N6
 jJ3MQfmPIzUk0mfIjHrBlqrA+3tjtt5LGU+QpOWs8g509xHCP0BfGGOXQhjMjHVI
 JVSqAuIENK4V1ubz7BZcSoPAVncPeFl8Ly5Qdw5FlDBux9kKsch8wJPjn1A1gkPt
 Fb9VBTRkCK7WqUzkmbQh152SNC/0plb/8qFjywHNkvYyGMMlJME8zDIg40RN+0Ql
 ckXjlvdVGm0GbyM2GLth4gbOSXDzKrq12i3rWROnOLZ0Q2SBKfJe5K0UdRat1/nu
 2sWWJNJqDzaaP1Gd/qk3Yht06GWnhI/17Bl/Znt5M8rxtSBbbxO58vi3gxasbccS
 l3qozuNouvAMNRBqE4ayVtjV+Aj69j1IBJnAfCareDDDf6ugjooLqu27BQkLOPg7
 wswq633T6WG+UfQ44GvseiCaDW5MMh0aq7vxzjnBUoTz5usMfxg=
 =d0Zb
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2025-01-20.2' of git://evilpiepirate.org/bcachefs

Pull bcachefs updates from Kent Overstreet:
 "Lots of scalability work, another big on-disk format change. On-disk
  format version goes from 1.13 to 1.20.

  Like 6.11, this is another big and expensive automatic/required on
  disk format upgrade. This is planned to be the last big on disk format
  upgrade before the experimental label comes off. There will be one
  more minor on disk format update for a few things that couldn't make
  this release.

  Headline improvements:

   - Self healing work:

     Allocator and reflink now run the exact same check/repair code that
     fsck does at runtime, where applicable.

     The long term goal here is to remove inconsistent() errors (that
     cause us to go emergency read only) by lifting fsck code up to
     normal runtime paths; we should only go emergency read-only if we
     detect an inconsistency that was due to a runtime bug - or truly
     catastrophic damage (corrupted btree roots/interior nodes).

   - Reflink repair no longer deletes reflink pointers:

     Instead we flip an error bit and log the error, and they can still
     be deleted by file deletion. This means a temporary failure to find
     an indirect extent (perhaps repaired later by btree node scan)
     won't result in unnecessary data loss

   - Improvements to rebalance data path option handling:

     We can now correctly apply changed filesystem-level io path options
     to pending rebalance work, and soon we'll be able to apply
     file-level io path option changes to indirect extents

   - Fix mount time regression that some users encountered post the 6.11
     disk accounting rewrite.

     Accounting keys were encoded little endian (typetag in the low
     bits) - which didn't anticipate adding accounting keys for every
     inode, which aren't stored in memory and we don't want to scan at
     mount time.

   - fsck time on large filesystems is improved by multiple orders of
     magnitude. Previously, 100TB was about the practical max filesystem
     size, where users were reporting fsck times of a day+. With the new
     changes (which nearly eliminate backpointers fsck overhead), we
     fsck'd a filesystem with 10PB of data in 1.5 hours.

     The problematic fsck passes were walking every extent and checking
     for missing backpointers, and walking every backpointer to check
     for dangling backpointers. As we've been adding more and more
     runtime self healing there was no reason to keep around the
     backpointers -> extents pass; dangling backpointers are just
     deleted, and we can do that when using them - thus, backpointers ->
     extents is now only run in debug mode.

     extents -> backpointers does need to exist, since missing
     backpointers would mean we can't find data to move it (for e.g.
     copygc, device evacuate, scrub). But the new on disk format version
     makes possible a new strategy where we sum up backpointers within a
     bucket and check it against the bucket sector counts, and then only
     scan for missing backpointers if the counts are off (and then, only
     for specific buckets).

  Full list of on disk format changes:

   - 1.14: backpointer_bucket_gen

     Backpointers now have a field for the bucket generation number,
     replacing the obsolete bucket_offset field. This is needed for the
     new "sum up backpointers within a bucket" code, since backpointers
     use the btree write buffer - meaning we will see stale reads, and
     this runs online, with the filesystem in full rw mode.

   - 1.15: disk_accounting_big_endian

     As previously described, fix the endianness of accounting keys so
     that accounting keys with the same typetag sort together, and
     accounting read can skip types it's not interested in.

   - 1.16: reflink_p_may_update_opts:

     This version indicates that a new reflink pointer field is
     understood and may be used; the field indicates whether the reflink
     pointer has permissions to update IO path options (e.g.
     compression, replicas) may be updated on the indirect extent it
     points to.

     This completes the rebalance/reflink data path option handling from
     the 6.13 pull request.

   - 1.17: inode_depth

     Add a new inode field, bi_depth, to accelerate the
     check_directory_structure fsck path, which checks for loops in the
     filesystem heirarchy.

     check_inodes and check_dirents check connectivity, so
     check_directory_structure only has to check for loops - by walking
     back up to the root from every directory.

     But a path can't be a loop if it has a counter that increases
     monotonically from root to leaf - adding a depth counter means that
     we can check for loops with only local (parent -> child) checks. We
     might need to occasionally renumber the depth field in fsck if
     directories have been moved around, but then future fsck runs will
     be much faster.

   - 1.18: persistent_inode_cursors

     Previously, the cursor used for inode allocation was only kept in
     memory, which meant that users with large filesystems and lots of
     files were reporting that the first create after mounting would
     take awhile - since it had to scan from the start.

     Inode allocation cursors are now persistent, and also include a
     generation field (incremented on wraparound, which will only happen
     if inode allocation is restricted to 32 bit inodes), so that we
     don't have to leave inode_generation keys around after a delete.

     The option for 32 bit inode numbers may now also be set on
     individual directories, and non-32 bit inode allocations are
     disallowed from allocating from the 32 bit part of the inode number
     space.

   - 1.19: autofix_errors

     Runtime self healing is now the default.o

   - 1.20: directory size (from Hongbo)

     directory i_size is now meaningful, and not 0"

* tag 'bcachefs-2025-01-20.2' of git://evilpiepirate.org/bcachefs: (268 commits)
  bcachefs: Fix check_inode_hash_info_matches_root()
  bcachefs: Document issue with bch_stripe layout
  bcachefs: Fix self healing on read error
  bcachefs: Pop all the transactions from the abort one
  bcachefs: Only abort the transactions in the cycle
  bcachefs: Introduce lock_graph_pop_from
  bcachefs: Convert open-coded lock_graph_pop_all to helper
  bcachefs: Do not allow no fail lock request to fail
  bcachefs: Merge the condition to avoid additional invocation
  Revert "bcachefs: Fix bch2_btree_node_upgrade()"
  bcachefs: bcachefs_metadata_version_directory_size
  bcachefs: make directory i_size meaningful
  bcachefs: check_unreachable_inodes is not actually PASS_ONLINE yet
  bcachefs: Don't use BTREE_ITER_cached when walking alloc btree during fsck
  bcachefs: Check for dirents to overwritten inodes
  bcachefs: bch2_btree_iter_peek_slot() handles navigating to nonexistent depth
  bcachefs: Don't set btree_path to updtodate if we don't fill
  bcachefs: __bch2_btree_pos_to_text()
  bcachefs: printbuf_reset() handles tabstops
  bcachefs: Silence read-only errors when deleting snapshots
  ...
2025-01-20 13:55:19 -08:00
Al Viro f7862dfef6 saner replacement for debugfs_rename()
Existing primitive has several problems:
	1) calling conventions are clumsy - it returns a dentry reference
that is either identical to its second argument or is an ERR_PTR(-E...);
in both cases no refcount changes happen.  Inconvenient for users and
bug-prone; it would be better to have it return 0 on success and -E... on
failure.
	2) it allows cross-directory moves; however, no such caller have
ever materialized and considering the way debugfs is used, it's unlikely
to happen in the future.  What's more, any such caller would have fun
issues to deal with wrt interplay with recursive removal.  It also makes
the calling conventions clumsier...
	3) tautological rename fails; the callers have no race-free way
to deal with that.
	4) new name must have been formed by the caller; quite a few
callers have it done by sprintf/kasprintf/etc., ending up with considerable
boilerplate.

Proposed replacement: int debugfs_change_name(dentry, fmt, ...).  All callers
convert to that easily, and it's simpler internally.

IMO debugfs_rename() should go; if we ever get a real-world use case for
cross-directory moves in debugfs, we can always look into the right way
to handle that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20250112080705.141166-21-viro@zeniv.linux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-15 13:14:37 +01:00
Mike Snitzer 4a489220aa nfs: probe for LOCALIO when v3 client reconnects to server
Re-enabling NFSv3 LOCALIO is made more complex (than NFSv4) because v3
is stateless.  As such, the hueristic used to identify a LOCALIO probe
point is more adhoc by nature: if/when NFSv3 client IO begins to
complete again in terms of normal RPC-based NFSv3 server IO, attempt
nfs_local_probe_async().

Care is taken to throttle the frequency of nfs_local_probe_async(),
otherwise there could be a flood of repeat calls to
nfs_local_probe_async().

The throttle is admin controlled using a new module parameter for
nfsv3, e.g.:
  echo 512 > /sys/module/nfsv3/parameters/nfs3_localio_probe_throttle

Probe for NFSv3 LOCALIO every N IO requests (512 in this case). Must
be power-of-2, defaults to 0 (probing disabled).

On systems that expect to use LOCALIO with NFSv3 the admin should
configure the 'nfs3_localio_probe_throttle' module parameter.

This commit backfills module parameter documentation in localio.rst

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14 17:05:10 -05:00
Mike Snitzer b33f7dec3a nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
Also update Documentation/filesystems/nfs/localio.rst accordingly
and reduce the technical documentation debt that was previously
captured in that document.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14 17:05:07 -05:00
Mike Snitzer 3feec68563 nfs/localio: add direct IO enablement with sync and async IO support
This commit simply adds the required O_DIRECT plumbing.  It doesn't
address the fact that NFS doesn't ensure all writes are page aligned
(nor device logical block size aligned as required by O_DIRECT).

Because NFS will read-modify-write for IO that isn't aligned, LOCALIO
will not use O_DIRECT semantics by default if/when an application
requests the use of O_DIRECT.  Allow the use of O_DIRECT semantics by:
1: Adding a flag to the nfs_pgio_header struct to allow the NFS
   O_DIRECT layer to signal that O_DIRECT was used by the application
2: Adding a 'localio_O_DIRECT_semantics' NFS module parameter that
   when enabled will cause LOCALIO to use O_DIRECT semantics (this may
   cause IO to fail if applications do not properly align their IO).

This commit is derived from code developed by Weston Andros Adamson.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14 17:04:02 -05:00
Geert Uytterhoeven a883764111 overlayfs.rst: Fix and improve grammar
- Correct "in a way the" to "in a way that",
  - Add a comma to improve readability.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/cf07f705d63f04ebf7ba4ecafdc9ab6f63960e3d.1736239148.git.geert+renesas@glider.be
2025-01-09 11:51:44 -07:00
Randy Dunlap 135ec43eb2
fiemap: use kernel-doc includes in fiemap docbook
Add some kernel-doc notation to structs in fiemap header files
then pull that into Documentation/filesystems/fiemap.rst
instead of duplicating the header file structs in fiemap.rst.
This helps to future-proof fiemap.rst against struct changes.

Add missing flags documentation from header files into fiemap.rst
for FIEMAP_FLAG_CACHE and FIEMAP_EXTENT_SHARED.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20241121011352.201907-1-rdunlap@infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-22 11:29:50 +01:00
Dennis Lam 71008e5d6f docs: filesystems: bcachefs: fixed some spelling mistakes in the bcachefs coding style page
Specifically, fixed spelling of "commit" and pluralization of last sentence.

Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:14 -05:00
Bingwu Zhang 9fb89b9765 Documentation: filesystems: fix two misspells
This fixes two small misspells in the filesystems documentation.

Signed-off-by: Bingwu Zhang <xtex@aosc.io>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241208035447.162465-2-xtex@envs.net
2024-12-13 08:46:08 -07:00
Carlos Maiolino dfddf35310 Documentation: Fix simple typo on filesystems/porting.rst
Just spotted this while reading the doc.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241213151743.23435-1-cem@kernel.org
2024-12-13 08:35:49 -07:00
Linus Torvalds 7d4050728c vfs-6.13-rc1.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ0YDhQAKCRCRxhvAZXjc
 ouy/AQDds2VXT3baRn5mvLOnWaN9tez+TnPLUKbS8m4srJUrGgD/SQkYc14vANGL
 iIw6oAhDhdzrjrm0rxr2COXah6me7g0=
 =Xu0X
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix a few iomap bugs

 - Fix a wrong argument in backing file callback

 - Fix security mount option retrieval in statmount()

 - Cleanup how statmount() handles unescaped options

 - Add a missing inode_owner_or_capable() check for setting write hints

 - Clear the return value in read_kcore_iter() after a successful
   iov_iter_zero()

 - Fix a mount_setattr() selftest

 - Fix function signature in mount api documentation

 - Remove duplicate include header in the fscache code

* tag 'vfs-6.13-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs/backing_file: fix wrong argument in callback
  fs_parser: update mount_api doc to match function signature
  fs: require inode_owner_or_capable for F_SET_RW_HINT
  fs/proc/kcore.c: Clear ret value in read_kcore_iter after successful iov_iter_zero
  statmount: fix security option retrieval
  statmount: clean up unescaped option handling
  fscache: Remove duplicate included header
  iomap: elide flush from partial eof zero range
  iomap: lift zeroed mapping handling into iomap_zero_range()
  iomap: reset per-iter state on non-error iter advances
  iomap: warn on zero range of a post-eof folio
  selftests/mount_setattr: Fix failures on 64K PAGE_SIZE kernels
2024-11-27 08:11:46 -08:00
Linus Torvalds e68ce9474a A few late-arriving fixes, plus two more significant changes that were
*almost* ready at the beginning of the merge window:
 
 - A new document on debugging techniques from Sebastian Fricke
 
 - A clarification on MODULE_LICENSE terms meant to head off the sort of
   confusion that led to the recent Tuxedo Computers mess.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmdFEf0PHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YLCQH+wY0lGEF5BloFrNOcwKoB96rXQjLMlPVpccP
 lWVprapS+NrlhTq4RZ9b6qbQ1RAdu0JCppew1viwclO8g8SmUoXmqNnlYIFH+3MB
 HZETbPWUHK2BRQqV7h3VkgvO30hUa0kHL3WfmKpGEG1P6FsQQ5o3WDi3YN8GM6xk
 tfHSiR4rgBw40VLyeDtRi++aEgYa/DfWpdtco58poCiAS6soTDDEWCxSBdibeDOQ
 YDuj1NtqieMk963z8CoJm/Qbw/ZLfW2jd3A43cZ0h6g/oloVYSucFcMjXpePvoZr
 9BSkU9OyX5BRfhU/6EbU8eWYjgu0BuBk5uvCwkQgHz1p05MGRDE=
 =MmbD
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.13-2' of git://git.lwn.net/linux

Pull more documentation updates from Jonathan Corbet:
 "A few late-arriving fixes, plus two more significant changes that were
  *almost* ready at the beginning of the merge window:

   - A new document on debugging techniques from Sebastian Fricke

   - A clarification on MODULE_LICENSE terms meant to head off the sort
     of confusion that led to the recent Tuxedo Computers mess"

* tag 'docs-6.13-2' of git://git.lwn.net/linux:
  docs: Add debugging guide for the media subsystem
  docs: Add debugging section to process
  docs/licensing: Clarify wording about "GPL" and "Proprietary"
  docs: core-api/gfp_mask-from-fs-io: indicate that vmalloc supports GFP_NOFS/GFP_NOIO
  Documentation: kernel-doc: enumerate identifier *type*s
  Documentation: pwrseq: Fix trivial misspellings
  Documentation: filesystems: update filename extensions
2024-11-26 13:44:27 -08:00
Linus Torvalds 44b4d13b70 f2fs-for-6.13-rc1
This series introduces a device aliasing feature where user can carve out
 partitions but reclaim the space back by deleting aliased file in root dir.
 In addition to that, there're numerous minor bug fixes in zoned device support,
 checkpoint=disable, extent cache management, fiemap, and lazytime mount option.
 The full list of noticeable changes can be found below.
 
 Enhancement:
  - introduce device aliasing file
  - add stats in debugfs to show multiple devices
  - add a sysfs node to limit max read extent count per-inode
  - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable
  - decrease spare area for pinned files for zoned devices
 
 Bug fix:
  - Revert "f2fs: remove unreachable lazytime mount option parsing"
  - adjust unusable cap before checkpoint=disable mode
  - fix to drop all discards after creating snapshot on lvm device
  - fix to shrink read extent node in batches
  - fix changing cursegs if recovery fails on zoned device
  - fix to adjust appropriate length for fiemap
  - fix fiemap failure issue when page size is 16KB
  - fix to avoid forcing direct write to use buffered IO on inline_data inode
  - fix to map blocks correctly for direct write
  - fix to account dirty data in __get_secs_required()
  - fix null-ptr-deref in f2fs_submit_page_bio()
  - f2fs: compress: fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmdD25MACgkQQBSofoJI
 UNKgLhAAgr0Dy/VWDgRlMovckq0q5EyQu/Jospv6mJyErQ4pZwwidNn9FSf0yua9
 O0Pofs1zMFWoe5R2UOvwOnahmvlwD1nnRMylA10/9hp+/aKlTRxOI7HrdL5wFgWG
 QRTb/k+mgoEQk8+9ElThzq/CkmQovPEUfhoxW7bE4zH9kVoxi2klFbkASZynqEFe
 a+TVQoDUnXvb1cbvr4zEVuD79QEmazD/bgc+gquxChCHfzX8ip4R0aCZM1ceTgm/
 Vru0LUKGQTWXPPReugJbOOtoIJ/kgD9Sg5xa7Icg3nxukgiYUDdl3e7MTgfvHOK6
 Fwwj+ZbM/yV/gpAQp+g+uOkKSFqfulyOb+nzX5tmebmiT2Vs6XSQ0Xo+fjm7N1QC
 j0G1vwz91xETK/gw2U/zL/HQVB3IU/2dtBT2ek4x6kmVL3rmHYoI6r2ofQcEFjGn
 2YQ9yvvT/fY6fza88kWO0PjgIRDzw9D9ihfZVyH9MCy5n6adhWlFXIg0HbAoecDE
 6xsVjb5BVYJfQvVz3FauGRXu6i3mePaURC1rrf5NKFfAWJP7pDfi9IvSL56u2aMt
 J+RJ7a2u1l1z/yhBxtr00KhMP586OZHVJwQvwNJV7mzBFhvOlm3a4jTzbG35dE+V
 MfbbjR628y/0IkqZiB7YVu1NIF2qdbZosv4nO7b584Q1h1NH/PU=
 =LOgM
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This series introduces a device aliasing feature where user can carve
  out partitions but reclaim the space back by deleting aliased file in
  root dir.

  In addition to that, there're numerous minor bug fixes in zoned device
  support, checkpoint=disable, extent cache management, fiemap, and
  lazytime mount option. The full list of noticeable changes can be
  found below.

  Enhancements:
   - introduce device aliasing file
   - add stats in debugfs to show multiple devices
   - add a sysfs node to limit max read extent count per-inode
   - modify f2fs_is_checkpoint_ready logic to allow more data to be
     written with the CP disable
   - decrease spare area for pinned files for zoned devices

  Fixes:
   - Revert "f2fs: remove unreachable lazytime mount option parsing"
   - adjust unusable cap before checkpoint=disable mode
   - fix to drop all discards after creating snapshot on lvm device
   - fix to shrink read extent node in batches
   - fix changing cursegs if recovery fails on zoned device
   - fix to adjust appropriate length for fiemap
   - fix fiemap failure issue when page size is 16KB
   - fix to avoid forcing direct write to use buffered IO on inline_data
     inode
   - fix to map blocks correctly for direct write
   - fix to account dirty data in __get_secs_required()
   - fix null-ptr-deref in f2fs_submit_page_bio()
   - fix inconsistent update of i_blocks in release_compress_blocks and
     reserve_compress_blocks"

* tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
  f2fs: fix to drop all discards after creating snapshot on lvm device
  f2fs: add a sysfs node to limit max read extent count per-inode
  f2fs: fix to shrink read extent node in batches
  f2fs: print message if fscorrupted was found in f2fs_new_node_page()
  f2fs: clear SBI_POR_DOING before initing inmem curseg
  f2fs: fix changing cursegs if recovery fails on zoned device
  f2fs: adjust unusable cap before checkpoint=disable mode
  f2fs: fix to requery extent which cross boundary of inquiry
  f2fs: fix to adjust appropriate length for fiemap
  f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}
  f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow
  f2fs: replace deprecated strcpy with strscpy
  Revert "f2fs: remove unreachable lazytime mount option parsing"
  f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode
  f2fs: fix to map blocks correctly for direct write
  f2fs: fix race in concurrent f2fs_stop_gc_thread
  f2fs: fix fiemap failure issue when page size is 16KB
  f2fs: remove redundant atomic file check in defragment
  f2fs: fix to convert log type to segment data type correctly
  f2fs: clean up the unused variable additional_reserved_segments
  ...
2024-11-26 12:50:58 -08:00
Christian Brauner cf87766dd6
Merge branch 'ovl.fixes'
Bring in an overlayfs fix for v6.13-rc1 that fixes a bug introduced by
the overlayfs changes merged for v6.13.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-26 18:15:06 +01:00
Randy Dunlap c66f759832
fs_parser: update mount_api doc to match function signature
Add the missing 'name' parameter to the mount_api documentation for
fs_validate_description().

Fixes: 96cafb9ccb ("fs_parser: remove fs_parameter_description name field")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20241125215021.231758-1-rdunlap@infradead.org
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-26 10:32:20 +01:00
Randy Dunlap 1e726223be Documentation: filesystems: update filename extensions
Update references to most txt files to rst files.
Update one reference to an md file to a rst file.
Update one file path to its current location.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
Cc: Ian Kent <raven@themaw.net>
Cc: autofs@vger.kernel.org
Cc: Alexander Aring <aahringo@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: gfs2@lists.linux.dev
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Theodore Y. Ts'o <tytso@mit.edu>
Cc: fsverity@lists.linux.dev
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: ocfs2-devel@lists.linux.dev
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241120055246.158368-1-rdunlap@infradead.org
2024-11-22 10:31:04 -07:00
Linus Torvalds c01f664e4c \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmc/WXkACgkQnJ2qBz9k
 QNnwjAf/c8K3Vhw9RuKMtPF0K+gC//0mLsq+WmgrtXfMLvbSymrACnwHFJzpNGeS
 iEqCYlCC7vlqzPXpsVRlFeHpM52oVnE/wFF0Hp1h/Y1oqbRSzur6iSl4epmmBN+K
 AsPoWEXco7ABqtrhoZb0b1n7io9VorHN4nLhO6KWD83nZAawJDWgSw0sNCqcT6to
 vVxR3baP/EhONxNquxXe2lxq26dMilehmTk4AOyYslNYb0iG4r18TPyNb7fmuuKG
 M+nFfMnM9EPH8lnmgx6Mg/X77d/eZoq4pMRmeqSsroB5k/AQJnNrGweNL1+yr7OY
 adWNOMGWdNNQXPFgGbL5yZwNZ64kRA==
 =Eq1B
 -----END PGP SIGNATURE-----

Merge tag 'reiserfs_delete' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull reiserfs removal from Jan Kara:
 "The deprecation period of reiserfs is ending at the end of this year
  so it is time to remove it"

* tag 'reiserfs_delete' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: The last commit
2024-11-21 09:50:18 -08:00
Linus Torvalds ba1f9c8fe3 arm64 updates for 6.13:
* Support for running Linux in a protected VM under the Arm Confidential
   Compute Architecture (CCA)
 
 * Guarded Control Stack user-space support. Current patches follow the
   x86 ABI of implicitly creating a shadow stack on clone(). Subsequent
   patches (already on the list) will add support for clone3() allowing
   finer-grained control of the shadow stack size and placement from libc
 
 * AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are
   getting close with the upcoming dpISA support)
 
 * Other arch features:
 
   - In-kernel use of the memcpy instructions, FEAT_MOPS (previously only
     exposed to user; uaccess support not merged yet)
 
   - MTE: hugetlbfs support and the corresponding kselftests
 
   - Optimise CRC32 using the PMULL instructions
 
   - Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG
 
   - Optimise the kernel TLB flushing to use the range operations
 
   - POE/pkey (permission overlays): further cleanups after bringing the
     signal handler in line with the x86 behaviour for 6.12
 
 * arm64 perf updates:
 
   - Support for the NXP i.MX91 PMU in the existing IMX driver
 
   - Support for Ampere SoCs in the Designware PCIe PMU driver
 
   - Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC
 
   - Support for Samsung's 'Mongoose' CPU PMU
 
   - Support for PMUv3.9 finer-grained userspace counter access control
 
   - Switch back to platform_driver::remove() now that it returns 'void'
 
   - Add some missing events for the CXL PMU driver
 
 * Miscellaneous arm64 fixes/cleanups:
 
   - Page table accessors cleanup: type updates, drop unused macros,
     reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity
     check addresses before runtime P4D/PUD folding
 
   - Command line override for ID_AA64MMFR0_EL1.ECV (advertising the
     FEAT_ECV for the generic timers) allowing Linux to boot with
     firmware deployments that don't set SCTLR_EL3.ECVEn
 
   - ACPI/arm64: tighten the check for the array of platform timer
     structures and adjust the error handling procedure in
     gtdt_parse_timer_block()
 
   - Optimise the cache flush for the uprobes xol slot (skip if no
     change) and other uprobes/kprobes cleanups
 
   - Fix the context switching of tpidrro_el0 when kpti is enabled
 
   - Dynamic shadow call stack fixes
 
   - Sysreg updates
 
   - Various arm64 kselftest improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmc5POIACgkQa9axLQDI
 XvEDYA//a3eeNkgMuGdnSCVcLz+zy+oNwAwboG/4X1DqL8jiCbI4npwugPx95RIA
 YZOUvo9T2aL3OyefpUHll4gFHqx9OwoZIig2F70TEUmlPsGUbh0KBkdfQF3xZPdl
 EwV0kHSGEqMWMBwsGJGwgCYrUaf1MUQzh1GBl7VJ2ts5XsJBaBeOyKkysij26wtZ
 V+aHq2IUx7qQS7+HC/4P6IoHxKziFcsCMovaKaynP4cw9xXBQbDMcNlHEwndOMyk
 pu2zrv7GG0j3KQuVP/2Alf5FKhmI0GVGP/6Nc/zsOmw96w8Kf7HfzEtkHawr2aRq
 rqg/c9ivzDn1p+fUBo4ZYtrRk4IAY+yKu6hdzdLTP5+bQrBTWTO9rjQVBm9FAGYT
 sCdEj1NqzvExvNHD7X6ut/GJ05lmce3K+qeSXSEysN9gqiT3eomYWMXrD2V2lxzb
 rIDDcb/icfaqjt14Mksh19r/rzNeq7noj9CGSmcqw0BHZfHzl38Lai6pdfYzCNyn
 vCM/c4c1D/WWX8/lifO1JZVbhDk1jy82Iphg2KEhL8iKPxDsKBBZLmYuU1oa7tMo
 WryGAz9+GQwd+W9chFuaOEtMnzvW2scEJ5Eb2fEf0Qj0aEurkL+C9dZR6o1GN77V
 DBUxtU628Ef4PJJGfbNCwZzdd8UPYG3a/mKfQQ3dz0oz2LySlW4=
 =wDot
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Support for running Linux in a protected VM under the Arm
   Confidential Compute Architecture (CCA)

 - Guarded Control Stack user-space support. Current patches follow the
   x86 ABI of implicitly creating a shadow stack on clone(). Subsequent
   patches (already on the list) will add support for clone3() allowing
   finer-grained control of the shadow stack size and placement from
   libc

 - AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are
   getting close with the upcoming dpISA support)

 - Other arch features:

     - In-kernel use of the memcpy instructions, FEAT_MOPS (previously
       only exposed to user; uaccess support not merged yet)

     - MTE: hugetlbfs support and the corresponding kselftests

     - Optimise CRC32 using the PMULL instructions

     - Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG

     - Optimise the kernel TLB flushing to use the range operations

     - POE/pkey (permission overlays): further cleanups after bringing
       the signal handler in line with the x86 behaviour for 6.12

 - arm64 perf updates:

     - Support for the NXP i.MX91 PMU in the existing IMX driver

     - Support for Ampere SoCs in the Designware PCIe PMU driver

     - Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC

     - Support for Samsung's 'Mongoose' CPU PMU

     - Support for PMUv3.9 finer-grained userspace counter access
       control

     - Switch back to platform_driver::remove() now that it returns
       'void'

     - Add some missing events for the CXL PMU driver

 - Miscellaneous arm64 fixes/cleanups:

     - Page table accessors cleanup: type updates, drop unused macros,
       reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity
       check addresses before runtime P4D/PUD folding

     - Command line override for ID_AA64MMFR0_EL1.ECV (advertising the
       FEAT_ECV for the generic timers) allowing Linux to boot with
       firmware deployments that don't set SCTLR_EL3.ECVEn

     - ACPI/arm64: tighten the check for the array of platform timer
       structures and adjust the error handling procedure in
       gtdt_parse_timer_block()

     - Optimise the cache flush for the uprobes xol slot (skip if no
       change) and other uprobes/kprobes cleanups

     - Fix the context switching of tpidrro_el0 when kpti is enabled

     - Dynamic shadow call stack fixes

     - Sysreg updates

     - Various arm64 kselftest improvements

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (168 commits)
  arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled
  kselftest/arm64: Try harder to generate different keys during PAC tests
  kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()
  arm64/ptrace: Clarify documentation of VL configuration via ptrace
  kselftest/arm64: Corrupt P0 in the irritator when testing SSVE
  acpi/arm64: remove unnecessary cast
  arm64/mm: Change protval as 'pteval_t' in map_range()
  kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c
  kselftest/arm64: Add FPMR coverage to fp-ptrace
  kselftest/arm64: Expand the set of ZA writes fp-ptrace does
  kselftets/arm64: Use flag bits for features in fp-ptrace assembler code
  kselftest/arm64: Enable build of PAC tests with LLVM=1
  kselftest/arm64: Check that SVCR is 0 in signal handlers
  selftests/mm: Fix unused function warning for aarch64_write_signal_pkey()
  kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests
  kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test
  kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests
  kselftest/arm64: Fix build with stricter assemblers
  arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux()
  arm64/scs: Deal with 64-bit relative offsets in FDE frames
  ...
2024-11-18 18:10:37 -08:00
Linus Torvalds 241c7ed4d4 vfs-6.13.untorn.writes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcopwAKCRCRxhvAZXjc
 oitWAQD68PGFI6/ES9x+qGsDFEZBH08icuO+a9dyaZXyNRosDgD/ex2zHj6F7IzS
 Ghgb9jiqWQ8l2+PDYfisxa/0jiqCbAk=
 =DmXf
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs untorn write support from Christian Brauner:
 "An atomic write is a write issed with torn-write protection. This
  means for a power failure or any hardware failure all or none of the
  data from the write will be stored, never a mix of old and new data.

  This work is already supported for block devices. If a block device is
  opened with O_DIRECT and the block device supports atomic write, then
  FMODE_CAN_ATOMIC_WRITE is added to the file of the opened block
  device.

  This contains the work to expand atomic write support to filesystems,
  specifically ext4 and XFS. Currently, only support for writing exactly
  one filesystem block atomically is added.

  Since it's now possible to have filesystem block size > page size for
  XFS, it's possible to write 4K+ blocks atomically on x86"

* tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: drop an obsolete comment in iomap_dio_bio_iter
  ext4: Do not fallback to buffered-io for DIO atomic write
  ext4: Support setting FMODE_CAN_ATOMIC_WRITE
  ext4: Check for atomic writes support in write iter
  ext4: Add statx support for atomic writes
  xfs: Support setting FMODE_CAN_ATOMIC_WRITE
  xfs: Validate atomic writes
  xfs: Support atomic write for statx
  fs: iomap: Atomic write support
  fs: Export generic_atomic_write_valid()
  block: Add bdev atomic write limits helpers
  fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()
  block/fs: Pass an iocb to generic_atomic_write_valid()
2024-11-18 11:30:09 -08:00
Linus Torvalds 7956186e75 vfs-6.13.tmpfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcZIgAKCRCRxhvAZXjc
 oge4AQDxhsKW+v/jKHydzqzwG3Ks7DIxrUg/mcGfdtBwjiWgvwEA8t0QAAfKECAK
 B0+bNKJ8XJRUtZ10Jgm3dzURbEhBWgU=
 =4Lui
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull tmpfs case folding updates from Christian Brauner:
 "This adds case-insensitive support for tmpfs.

  The work contained in here adds support for case-insensitive file
  names lookups in tmpfs. The main difference from other casefold
  filesystems is that tmpfs has no information on disk, just on RAM, so
  we can't use mkfs to create a case-insensitive tmpfs. For this
  implementation, there's a mount option for casefolding. The rest of
  the patchset follows a similar approach as ext4 and f2fs.

  The use case for this feature is similar to the use case for ext4, to
  better support compatibility layers (like Wine), particularly in
  combination with sandboxing/container tools (like Flatpak).

  Those containerization tools can share a subset of the host filesystem
  with an application. In the container, the root directory and any
  parent directories required for a shared directory are on tmpfs, with
  the shared directories bind-mounted into the container's view of the
  filesystem.

  If the host filesystem is using case-insensitive directories, then the
  application can do lookups inside those directories in a
  case-insensitive way, without this needing to be implemented in
  user-space. However, if the host is only sharing a subset of a
  case-insensitive directory with the application, then the parent
  directories of the mount point will be part of the container's root
  tmpfs. When the application tries to do case-insensitive lookups of
  those parent directories on a case-sensitive tmpfs, the lookup will
  fail"

* tag 'vfs-6.13.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  tmpfs: Initialize sysfs during tmpfs init
  tmpfs: Fix type for sysfs' casefold attribute
  libfs: Fix kernel-doc warning in generic_ci_validate_strict_name
  docs: tmpfs: Add casefold options
  tmpfs: Expose filesystem features via sysfs
  tmpfs: Add flag FS_CASEFOLD_FL support for tmpfs dirs
  tmpfs: Add casefold lookup support
  libfs: Export generic_ci_ dentry functions
  unicode: Recreate utf8_parse_version()
  unicode: Export latest available UTF-8 version number
  ext4: Use generic_ci_validate_strict_name helper
  libfs: Create the helper function generic_ci_validate_strict_name()
2024-11-18 11:05:26 -08:00
Linus Torvalds a29835c9d0 vfs-6.13.ovl
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcXJQAKCRCRxhvAZXjc
 opmaAQDuEpUXi3PSNwgXGRYAzeBpsonoNrBtzSRTlaza7KZCiAEAsFUSbQJNGdkN
 TqC9KvI1EiX2hVvWmVBVttBNH6CY+wY=
 =4le5
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull overlayfs updates from Christian Brauner:
 "Make overlayfs support specifying layers through file descriptors.

  Currently overlayfs only allows specifying layers through path names.
  This is inconvenient for users that want to assemble an overlayfs
  mount purely based on file descriptors:

  This enables user to specify both:

    fsconfig(fd_overlay, FSCONFIG_SET_FD, "upperdir+", NULL, fd_upper);
    fsconfig(fd_overlay, FSCONFIG_SET_FD, "workdir+",  NULL, fd_work);
    fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower1);
    fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower2);

  in addition to:

    fsconfig(fd_overlay, FSCONFIG_SET_STRING, "upperdir+", "/upper",  0);
    fsconfig(fd_overlay, FSCONFIG_SET_STRING, "workdir+",  "/work",   0);
    fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower1", 0);
    fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower2", 0);

  There's also a large set of new overlayfs selftests to test new
  features and some older properties"

* tag 'vfs-6.13.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests: add test for specifying 500 lower layers
  selftests: add overlayfs fd mounting selftests
  selftests: use shared header
  Documentation,ovl: document new file descriptor based layers
  ovl: specify layers via file descriptors
  fs: add helper to use mount option as path or fd
2024-11-18 10:45:06 -08:00
Linus Torvalds 70e7730c2a vfs-6.13.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcToAAKCRCRxhvAZXjc
 osL9AP948FFumJRC28gDJ4xp+X4eohNOfkgoEG8FTbF2zU6ulwD+O0pr26FqpFli
 pqlG+38UdATImpfqqWjPbb72sBYcfQg=
 =wLUh
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:

   - Fixup and improve NLM and kNFSD file lock callbacks

     Last year both GFS2 and OCFS2 had some work done to make their
     locking more robust when exported over NFS. Unfortunately, part of
     that work caused both NLM (for NFS v3 exports) and kNFSD (for
     NFSv4.1+ exports) to no longer send lock notifications to clients

     This in itself is not a huge problem because most NFS clients will
     still poll the server in order to acquire a conflicted lock

     It's important for NLM and kNFSD that they do not block their
     kernel threads inside filesystem's file_lock implementations
     because that can produce deadlocks. We used to make sure of this by
     only trusting that posix_lock_file() can correctly handle blocking
     lock calls asynchronously, so the lock managers would only setup
     their file_lock requests for async callbacks if the filesystem did
     not define its own lock() file operation

     However, when GFS2 and OCFS2 grew the capability to correctly
     handle blocking lock requests asynchronously, they started
     signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check
     for also trusting posix_lock_file() was inadvertently dropped, so
     now most filesystems no longer produce lock notifications when
     exported over NFS

     Fix this by using an fop_flag which greatly simplifies the problem
     and grooms the way for future uses by both filesystems and lock
     managers alike

   - Add a sysctl to delete the dentry when a file is removed instead of
     making it a negative dentry

     Commit 681ce86235 ("vfs: Delete the associated dentry when
     deleting a file") introduced an unconditional deletion of the
     associated dentry when a file is removed. However, this led to
     performance regressions in specific benchmarks, such as
     ilebench.sum_operations/s, prompting a revert in commit
     4a4be1ad3a ("Revert "vfs: Delete the associated dentry when
     deleting a file""). This reintroduces the concept conditionally
     through a sysctl

   - Expand the statmount() system call:

       * Report the filesystem subtype in a new fs_subtype field to
         e.g., report fuse filesystem subtypes

       * Report the superblock source in a new sb_source field

       * Add a new way to return filesystem specific mount options in an
         option array that returns filesystem specific mount options
         separated by zero bytes and unescaped. This allows caller's to
         retrieve filesystem specific mount options and immediately pass
         them to e.g., fsconfig() without having to unescape or split
         them

       * Report security (LSM) specific mount options in a separate
         security option array. We don't lump them together with
         filesystem specific mount options as security mount options are
         generic and most users aren't interested in them

         The format is the same as for the filesystem specific mount
         option array

   - Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command

   - Optimize acl_permission_check() to avoid costly {g,u}id ownership
     checks if possible

   - Use smp_mb__after_spinlock() to avoid full smp_mb() in evict()

   - Add synchronous wakeup support for ep_poll_callback.

     Currently, epoll only uses wake_up() to wake up task. But sometimes
     there are epoll users which want to use the synchronous wakeup flag
     to give a hint to the scheduler, e.g., the Android binder driver.
     So add a wake_up_sync() define, and use wake_up_sync() when sync is
     true in ep_poll_callback()

  Fixes:

   - Fix kernel documentation for inode_insert5() and iget5_locked()

   - Annotate racy epoll check on file->f_ep

   - Make F_DUPFD_QUERY associative

   - Avoid filename buffer overrun in initramfs

   - Don't let statmount() return empty strings

   - Add a cond_resched() to dump_user_range() to avoid hogging the CPU

   - Don't query the device logical blocksize multiple times for hfsplus

   - Make filemap_read() check that the offset is positive or zero

  Cleanups:

   - Various typo fixes

   - Cleanup wbc_attach_fdatawrite_inode()

   - Add __releases annotation to wbc_attach_and_unlock_inode()

   - Add hugetlbfs tracepoints

   - Fix various vfs kernel doc parameters

   - Remove obsolete TODO comment from io_cancel()

   - Convert wbc_account_cgroup_owner() to take a folio

   - Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add()

   - Reorder struct posix_acl to save 8 bytes

   - Annotate struct posix_acl with __counted_by()

   - Replace one-element array with flexible array member in freevxfs

   - Use idiomatic atomic64_inc_return() in alloc_mnt_ns()"

* tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
  statmount: retrieve security mount options
  vfs: make evict() use smp_mb__after_spinlock instead of smp_mb
  statmount: add flag to retrieve unescaped options
  fs: add the ability for statmount() to report the sb_source
  writeback: wbc_attach_fdatawrite_inode out of line
  writeback: add a __releases annoation to wbc_attach_and_unlock_inode
  fs: add the ability for statmount() to report the fs_subtype
  fs: don't let statmount return empty strings
  fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel()
  hfsplus: don't query the device logical block size multiple times
  freevxfs: Replace one-element array with flexible array member
  fs: optimize acl_permission_check()
  initramfs: avoid filename buffer overrun
  fs/writeback: convert wbc_account_cgroup_owner to take a folio
  acl: Annotate struct posix_acl with __counted_by()
  acl: Realign struct posix_acl to save 8 bytes
  epoll: Add synchronous wakeup support for ep_poll_callback
  coredump: add cond_resched() to dump_user_range
  mm/page-writeback.c: Fix comment of wb_domain_writeout_add()
  mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL
  ...
2024-11-18 09:35:30 -08:00
Linus Torvalds 6ac81fd55e vfs-6.13.mgtime
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcScQAKCRCRxhvAZXjc
 oj+5AP4k822a77wc/3iPFk379naIvQ4dsrgemh0/Pb6ZvzvkFQEAi3vFCfzCDR2x
 SkJF/RwXXKZv6U31QXMRt2Qo6wfBuAc=
 =nVlm
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs multigrain timestamps from Christian Brauner:
 "This is another try at implementing multigrain timestamps. This time
  with significant help from the timekeeping maintainers to reduce the
  performance impact.

  Thomas provided a base branch that contains the required timekeeping
  interfaces for the VFS. It serves as the base for the multi-grain
  timestamp work:

   - Multigrain timestamps allow the kernel to use fine-grained
     timestamps when an inode's attributes is being actively observed
     via ->getattr(). With this support, it's possible for a file to get
     a fine-grained timestamp, and another modified after it to get a
     coarse-grained stamp that is earlier than the fine-grained time. If
     this happens then the files can appear to have been modified in
     reverse order, which breaks VFS ordering guarantees.

     To prevent this, a floor value is maintained for multigrain
     timestamps. Whenever a fine-grained timestamp is handed out, record
     it, and when later coarse-grained stamps are handed out, ensure
     they are not earlier than that value. If the coarse-grained
     timestamp is earlier than the fine-grained floor, return the floor
     value instead.

     The timekeeper changes add a static singleton atomic64_t into
     timekeeper.c that is used to keep track of the latest fine-grained
     time ever handed out. This is tracked as a monotonic ktime_t value
     to ensure that it isn't affected by clock jumps. Because it is
     updated at different times than the rest of the timekeeper object,
     the floor value is managed independently of the timekeeper via a
     cmpxchg() operation, and sits on its own cacheline.

     Two new public timekeeper interfaces are added:

      (1) ktime_get_coarse_real_ts64_mg() fills a timespec64 with the
          later of the coarse-grained clock and the floor time

      (2) ktime_get_real_ts64_mg() gets the fine-grained clock value,
          and tries to swap it into the floor. A timespec64 is filled
          with the result.

   - The VFS has always used coarse-grained timestamps when updating the
     ctime and mtime after a change. This has the benefit of allowing
     filesystems to optimize away a lot metadata updates, down to around
     1 per jiffy, even when a file is under heavy writes.

     Unfortunately, this has always been an issue when we're exporting
     via NFSv3, which relies on timestamps to validate caches. A lot of
     changes can happen in a jiffy, so timestamps aren't sufficient to
     help the client decide when to invalidate the cache. Even with
     NFSv4, a lot of exported filesystems don't properly support a
     change attribute and are subject to the same problems with
     timestamp granularity. Other applications have similar issues with
     timestamps (e.g backup applications).

     If we were to always use fine-grained timestamps, that would
     improve the situation, but that becomes rather expensive, as the
     underlying filesystem would have to log a lot more metadata
     updates.

     This adds a way to only use fine-grained timestamps when they are
     being actively queried. Use the (unused) top bit in
     inode->i_ctime_nsec as a flag that indicates whether the current
     timestamps have been queried via stat() or the like. When it's set,
     we allow the kernel to use a fine-grained timestamp iff it's
     necessary to make the ctime show a different value.

     This solves the problem of being able to distinguish the timestamp
     between updates, but introduces a new problem: it's now possible
     for a file being changed to get a fine-grained timestamp. A file
     that is altered just a bit later can then get a coarse-grained one
     that appears older than the earlier fine-grained time. This
     violates timestamp ordering guarantees.

     This is where the earlier mentioned timkeeping interfaces help. A
     global monotonic atomic64_t value is kept that acts as a timestamp
     floor. When we go to stamp a file, we first get the latter of the
     current floor value and the current coarse-grained time. If the
     inode ctime hasn't been queried then we just attempt to stamp it
     with that value.

     If it has been queried, then first see whether the current coarse
     time is later than the existing ctime. If it is, then we accept
     that value. If it isn't, then we get a fine-grained time and try to
     swap that into the global floor. Whether that succeeds or fails, we
     take the resulting floor time, convert it to realtime and try to
     swap that into the ctime.

     We take the result of the ctime swap whether it succeeds or fails,
     since either is just as valid.

     Filesystems can opt into this by setting the FS_MGTIME fstype flag.
     Others should be unaffected (other than being subject to the same
     floor value as multigrain filesystems)"

* tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: reduce pointer chasing in is_mgtime() test
  tmpfs: add support for multigrain timestamps
  btrfs: convert to multigrain timestamps
  ext4: switch to multigrain timestamps
  xfs: switch to multigrain timestamps
  Documentation: add a new file documenting multigrain timestamps
  fs: add percpu counters for significant multigrain timestamp events
  fs: tracepoints around multigrain timestamp events
  fs: handle delegated timestamps in setattr_copy_mgtime
  timekeeping: Add percpu counter for tracking floor swap events
  timekeeping: Add interfaces for handling timestamps with a floor value
  fs: have setattr_copy handle multigrain timestamps appropriately
  fs: add infrastructure for multigrain timestamps
2024-11-18 09:15:39 -08:00
John Garry 9e0933c21c fs: iomap: Atomic write support
Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC
flag set.

Initially FSes (XFS) should only support writing a single FS block
atomically.

As with any atomic write, we should produce a single bio which covers the
complete write length.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
[djwong: clarify a couple of things in the docs]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-04 16:14:02 -08:00
Linus Torvalds d56239a82e vfs-6.12-rc6.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZyTGAQAKCRCRxhvAZXjc
 opd6AQCal4omyfS8FYe4VRRZ/0XHouagq99I0U0TAmKkvoKAsgD/XrdE+pSTEkPX
 Pv4T9phh1cZRxcyKVu77UoYkuHJEDAg=
 =Lu9R
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull filesystem fixes from Christian Brauner:
 "VFS:

   - Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set

   - Add a get_tree_bdev_flags() helper that allows to modify e.g.,
     whether errors are logged into the filesystem context during
     superblock creation. This is used by erofs to fix a userspace
     regression where an error is currently logged when its used on a
     regular file which is an new allowed mode in erofs.

  netfs:

   - Fix the sysfs debug path in the documentation.

   - Fix iov_iter_get_pages*() for folio queues by skipping the page
     extracation if we're at the end of a folio.

  afs:

   - Fix moving subdirectories to different parent directory.

  autofs:

   - Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in
     validate_dev_ioctl(). The actual ioctl number, not the ioctl
     command needs to be checked for autofs"

* tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP
  autofs: fix thinko in validate_dev_ioctl()
  iov_iter: Fix iov_iter_get_pages*() for folio_queue
  afs: Fix missing subdir edit when renamed between parent dirs
  doc: correcting the debug path for cachefiles
  erofs: use get_tree_bdev_flags() to avoid misleading messages
  fs/super.c: introduce get_tree_bdev_flags()
2024-11-01 07:37:10 -10:00
Daeho Jeong 128d333f0d f2fs: introduce device aliasing file
F2FS should understand how the device aliasing file works and support
deleting the file after use. A device aliasing file can be created by
mkfs.f2fs tool and it can map the whole device with an extent, not
using node blocks. The file space should be pinned and normally used for
read-only usages.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:19:00 +00:00
André Almeida a713f830c9
docs: tmpfs: Add casefold options
Document mounting options for casefold support in tmpfs.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-9-f443d5814194@igalia.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-28 13:36:55 +01:00
Hongbo Li 6b51b9f65c
doc: correcting the debug path for cachefiles
The original debug path is under "/sys/modules", that's
wrong. The real path in kernel is "/sys/module". So we
can correct it.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20241022013812.2880883-1-lihongbo22@huawei.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-24 13:50:27 +02:00
Linus Torvalds 7166c32651 vfs-6.12-rc5.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZxY6XAAKCRCRxhvAZXjc
 opmUAQCu4KhzBBdZmFw3AfZFNJvYb1onT4FiU0pnyGgfvzEdEwD6AlnlgQ7DL3ZN
 WBqBzUl+DpGYJfzhkqoEGH89Fagx7QM=
 =mm68
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "afs:
   - Fix a lock recursion in afs_wake_up_async_call() on ->notify_lock

 netfs:
   - Drop the references to a folio immediately after the folio has been
     extracted to prevent races with future I/O collection

   - Fix a documenation build error

   - Downgrade the i_rwsem for buffered writes to fix a cifs reported
     performance regression when switching to netfslib

  vfs:
   - Explicitly return -E2BIG from openat2() if the specified size is
     unexpectedly large. This aligns openat2() with other extensible
     struct based system calls

   - When copying a mount namespace ensure that we only try to remove
     the new copy from the mount namespace rbtree if it has already been
     added to it

  nilfs:
   - Clear the buffer delay flag when clearing the buffer state clags
     when a buffer head is discarded to prevent a kernel OOPs

  ocfs2:
   - Fix an unitialized value warning in ocfs2_setattr()

  proc:
   - Fix a kernel doc warning"

* tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  proc: Fix W=1 build kernel-doc warning
  afs: Fix lock recursion
  fs: Fix uninitialized value issue in from_kuid and from_kgid
  fs: don't try and remove empty rbtree node
  netfs: Downgrade i_rwsem for a buffered write
  nilfs2: fix kernel bug due to missing clearing of buffer delay flag
  openat2: explicitly return -E2BIG for (usize > PAGE_SIZE)
  netfs: fix documentation build error
  netfs: In readahead, put the folio refs as soon extracted
2024-10-21 10:48:24 -07:00
Jan Kara fb6f20ecb1 reiserfs: The last commit
Deprecation period of reiserfs ends with the end of this year so it is
time to remove it from the kernel.

Acked-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2024-10-21 16:29:38 +02:00
Christoph Hellwig caf0ea451d iomap: remove iomap_file_buffered_write_punch_delalloc
Currently iomap_file_buffered_write_punch_delalloc can be called from
XFS either with the invalidate lock held or not.  To fix this while
keeping the locking in the file system and not the iomap library
code we'll need to life the locking up into the file system.

To prepare for that, open code iomap_file_buffered_write_punch_delalloc
in the only caller, and instead export iomap_write_delalloc_release.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-15 11:37:42 +02:00
Christian Brauner a89ed67d3c
Documentation,ovl: document new file descriptor based layers
Add a minimal example how to specify layers via file descriptors.

Link: https://lore.kernel.org/r/20241014-work-overlayfs-v3-3-32b3fed1286e@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-14 16:31:16 +02:00
Christian Brauner b40508ca5d
Merge patch series "timekeeping/fs: multigrain timestamp redux"
Jeff Layton <jlayton@kernel.org> says:

The VFS has always used coarse-grained timestamps when updating the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around 1
per jiffy, even when a file is under heavy writes.

Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide when to invalidate the cache. Even with NFSv4, a lot of
exported filesystems don't properly support a change attribute and are
subject to the same problems with timestamp granularity. Other
applications have similar issues with timestamps (e.g backup
applications).

If we were to always use fine-grained timestamps, that would improve the
situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.

What we need is a way to only use fine-grained timestamps when they are
being actively queried. Use the (unused) top bit in inode->i_ctime_nsec
as a flag that indicates whether the current timestamps have been
queried via stat() or the like. When it's set, we allow the kernel to
use a fine-grained timestamp iff it's necessary to make the ctime show
a different value.

This solves the problem of being able to distinguish the timestamp
between updates, but introduces a new problem: it's now possible for a
file being changed to get a fine-grained timestamp. A file that is
altered just a bit later can then get a coarse-grained one that appears
older than the earlier fine-grained time. This violates timestamp
ordering guarantees.

To remedy this, keep a global monotonic atomic64_t value that acts as a
timestamp floor.  When we go to stamp a file, we first get the latter of
the current floor value and the current coarse-grained time. If the
inode ctime hasn't been queried then we just attempt to stamp it with
that value.

If it has been queried, then first see whether the current coarse time
is later than the existing ctime. If it is, then we accept that value.
If it isn't, then we get a fine-grained time and try to swap that into
the global floor. Whether that succeeds or fails, we take the resulting
floor time, convert it to realtime and try to swap that into the ctime.

We take the result of the ctime swap whether it succeeds or fails, since
either is just as valid.

Filesystems can opt into this by setting the FS_MGTIME fstype flag.
Others should be unaffected (other than being subject to the same floor
value as multigrain filesystems).

* patches from https://lore.kernel.org/r/20241002-mgtime-v10-0-d1c4717f5284@kernel.org:
  tmpfs: add support for multigrain timestamps
  btrfs: convert to multigrain timestamps
  ext4: switch to multigrain timestamps
  xfs: switch to multigrain timestamps
  Documentation: add a new file documenting multigrain timestamps
  fs: add percpu counters for significant multigrain timestamp events
  fs: tracepoints around multigrain timestamp events
  fs: handle delegated timestamps in setattr_copy_mgtime
  fs: have setattr_copy handle multigrain timestamps appropriately
  fs: add infrastructure for multigrain timestamps

Link: https://lore.kernel.org/r/20241002-mgtime-v10-0-d1c4717f5284@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-10 10:20:57 +02:00
Jeff Layton e3fad0376d
Documentation: add a new file documenting multigrain timestamps
Add a high-level document that describes how multigrain timestamps work,
rationale for them, and some info about implementation and tradeoffs.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # documentation bits
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241002-mgtime-v10-8-d1c4717f5284@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-10 10:20:52 +02:00
Jonathan Corbet 368196e501
netfs: fix documentation build error
Commit 86b374d061 ("netfs: Remove fs/netfs/io.c") did what it said on the
tin, but failed to remove the reference to fs/netfs/io.c from the
documentation, leading to this docs build error:

  WARNING: kernel-doc './scripts/kernel-doc -rst -enable-lineno -sphinx-version 7.3.7 ./fs/netfs/io.c' failed with return code 1

Remove the offending kernel-doc line, making the docs build process a
little happier.

Fixes: 86b374d061 ("netfs: Remove fs/netfs/io.c")
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/874j5nlu86.fsf@trenco.lwn.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-08 10:39:38 +02:00
Mark Brown ae80e1629a mm: Define VM_SHADOW_STACK for arm64 when we support GCS
Use VM_HIGH_ARCH_5 for guarded control stack pages.

Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-14-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-04 12:04:36 +01:00
Christian Brauner 09ee2a670d
Merge patch series "Fixup NLM and kNFSD file lock callbacks"
Benjamin Coddington <bcodding@redhat.com> says:

Last year both GFS2 and OCFS2 had some work done to make their locking more
robust when exported over NFS.  Unfortunately, part of that work caused both
NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send
lock notifications to clients.

This in itself is not a huge problem because most NFS clients will still
poll the server in order to acquire a conflicted lock, but now that I've
noticed it I can't help but try to fix it because there are big advantages
for setups that might depend on timely lock notifications, and we've
supported that as a feature for a long time.

Its important for NLM and kNFSD that they do not block their kernel threads
inside filesystem's file_lock implementations because that can produce
deadlocks.  We used to make sure of this by only trusting that
posix_lock_file() can correctly handle blocking lock calls asynchronously,
so the lock managers would only setup their file_lock requests for async
callbacks if the filesystem did not define its own lock() file operation.

However, when GFS2 and OCFS2 grew the capability to correctly
handle blocking lock requests asynchronously, they started signalling this
behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting
posix_lock_file() was inadvertently dropped, so now most filesystems no
longer produce lock notifications when exported over NFS.

I tried to fix this by simply including the old check for lock(), but the
resulting include mess and layering violations was more than I could accept.
There's a much cleaner way presented here using an fop_flag, which while
potentially flag-greedy, greatly simplifies the problem and grooms the
way for future uses by both filesystems and lock managers alike.

* patches from https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com:
  exportfs: Remove EXPORT_OP_ASYNC_LOCK
  NLM/NFSD: Fix lock notifications for async-capable filesystems
  gfs2/ocfs2: set FOP_ASYNC_LOCK
  fs: Introduce FOP_ASYNC_LOCK
  NFS: trace: show TIMEDOUT instead of 0x6e
  nfsd: use system_unbound_wq for nfsd_file_gc_worker()
  nfsd: count nfsd_file allocations
  nfsd: fix refcount leak when file is unhashed after being found
  nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire
  nfsd: add list_head nf_gc to struct nfsd_file

Link: https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-02 07:52:07 +02:00
Benjamin Coddington b875bd5b38
exportfs: Remove EXPORT_OP_ASYNC_LOCK
Now that GFS2 and OCFS2 are signalling async ->lock() support with
FOP_ASYNC_LOCK and checks for support are converted, we can remove
EXPORT_OP_ASYNC_LOCK.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/0a114db814fec3086f937ae3d44a086f13b8de26.1726083391.git.bcodding@redhat.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01 17:01:08 +02:00
Linus Torvalds 4965ddb166 USB/Thunderbolt update for 6.12-rc1
Here is the large set of USB and Thunderbolt changes for 6.12-rc1.
 
 Nothing "major" in here, except for a new 9p network gadget that has
 been worked on for a long time (all of the needed acks are here.)  Other
 than that, it's the usual set of:
   - Thunderbolt / USB4 driver updates and additions for new hardware
   - dwc3 driver updates and new features added
   - xhci driver updates
   - typec driver updates
   - USB gadget updates and api additions to make some gadgets more
     configurable by userspace
   - dwc2 driver updates
   - usb phy driver updates
   - usbip feature additions
   - other minor USB driver updates
 
 All of these have been in linux-next for a long time with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZvU0/g8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykGcACfSqouxRg8FRtq+nIKHWXI9lOTnVcAoKd9PAgq
 1i7yCNopPEPEW8sjz1GX
 =mY+S
 -----END PGP SIGNATURE-----

Merge tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/Thunderbolt updates from Greg KH:
 "Here is the large set of USB and Thunderbolt changes for 6.12-rc1.

  Nothing "major" in here, except for a new 9p network gadget that has
  been worked on for a long time (all of the needed acks are here)

  Other than that, it's the usual set of:

   - Thunderbolt / USB4 driver updates and additions for new hardware

   - dwc3 driver updates and new features added

   - xhci driver updates

   - typec driver updates

   - USB gadget updates and api additions to make some gadgets more
     configurable by userspace

   - dwc2 driver updates

   - usb phy driver updates

   - usbip feature additions

   - other minor USB driver updates

  All of these have been in linux-next for a long time with no reported
  issues"

* tag 'usb-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (145 commits)
  sub: cdns3: Use predefined PCI vendor ID constant
  sub: cdns2: Use predefined PCI vendor ID constant
  USB: misc: yurex: fix race between read and write
  USB: misc: cypress_cy7c63: check for short transfer
  USB: appledisplay: close race between probe and completion handler
  USB: class: CDC-ACM: fix race between get_serial and set_serial
  usb: r8a66597-hcd: make read-only const arrays static
  usb: typec: ucsi: Fix busy loop on ASUS VivoBooks
  usb: dwc3: rtk: Clean up error code in __get_dwc3_maximum_speed()
  usb: storage: ene_ub6250: Fix right shift warnings
  usb: roles: Improve the fix for a false positive recursive locking complaint
  locking/mutex: Introduce mutex_init_with_key()
  locking/mutex: Define mutex_init() once
  net/9p/usbg: fix CONFIG_USB_GADGET dependency
  usb: xhci: fix loss of data on Cadence xHC
  usb: xHCI: add XHCI_RESET_ON_RESUME quirk for Phytium xHCI host
  usb: dwc3: imx8mp: disable SS_CON and U3 wakeup for system sleep
  usb: dwc3: imx8mp: add 2 software managed quirk properties for host mode
  usb: host: xhci-plat: Parse xhci-missing_cas_quirk and apply quirk
  usb: misc: onboard_usb_dev: add Microchip usb5744 SMBus programming support
  ...
2024-09-26 09:45:36 -07:00
Linus Torvalds 684a64bf32 NFS Client Updates for Linux 6.12
New Features:
   * Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention
   * Add support for the LOCALIO protocol extention
 
 Bugfixes:
   * Fix memory leak in error path of nfs4_do_reclaim()
   * Simplify and guarantee lock owner uniqueness
   * Fix -Wformat-truncation warning
   * Fix folio refcounts by using folio_attach_private()
   * Fix failing the mount system call when the server is down
   * Fix detection of "Proxying of Times" server support
 
 Cleanups:
   * Annotate struct nfs_cache_array with __counted_by()
   * Remove unnecessary NULL checks before kfree()
   * Convert RPC_TASK_* constants to an enum
   * Remove obsolete or misleading comments and declerations
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmbzKDAACgkQ18tUv7Cl
 QOusjBAAxTSoVbHocl+9eYpvKnscPArgPXnfd6mB9rnQRgtnceTO2ei7cdiE2qhz
 dxQiyzlAXh3e7dGwoy02qEd6wTTqWeQ8ESdMpCAqSacBU4tu5owfzNSWunZvgYYj
 QOhjdmv8M1IfZnTstPlVrRNaZcDkhV1tzEtpZppkEqhTB0bHWqrcM4EdklTWT0Yc
 PGMpGbfuGsa4qZy2vWl7doERVEgK8mBeahLtYFD2W6phIvNWgD6IlKy66RaK2RfH
 nXmZoZbI2/ioi4TKvNyY8xoGMGvetLI1h8YNQYkEg060XCkisLZDOvoodUAylOTR
 2jHQLG5+/ejhpD/zgPghGZDSGNN1GyZaH09E/vtiS+3k9OXxFz6Rq68VnC6kpMA4
 TIUYsT8ejPzs2gW59iDFGB6cKI4XnRtxgmApW/Za0y9A72PSi+G/pbWAk7ThjTxf
 +HySsba4baA63opIgBSLVBrUsXZfdn/KTDTZ4nkPiq57BggGcZv7Y2ItOTXA+pB/
 5nigDKkhWsYVjMbkx6wmh+VO2gv4/Z8WqsmiDwFMpVqM0w8eycBOHjOumuuc6nmw
 y+2OKZqU2Npm2HI/R8lA7nB1m2QP5t7CRM2+xlZNuavHrfsMaqHNl8/9VgxlCATQ
 /Zo74hbhmCgQYxrTjL8XFQG9/8y0o3H5IcTEr/SgCVxHyDSan1I=
 =YjyC
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention
   - Add support for the LOCALIO protocol extention

  Bugfixes:
   - Fix memory leak in error path of nfs4_do_reclaim()
   - Simplify and guarantee lock owner uniqueness
   - Fix -Wformat-truncation warning
   - Fix folio refcounts by using folio_attach_private()
   - Fix failing the mount system call when the server is down
   - Fix detection of "Proxying of Times" server support

  Cleanups:
   - Annotate struct nfs_cache_array with __counted_by()
   - Remove unnecessary NULL checks before kfree()
   - Convert RPC_TASK_* constants to an enum
   - Remove obsolete or misleading comments and declerations"

* tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (41 commits)
  nfs: Fix `make htmldocs` warnings in the localio documentation
  nfs: add "NFS Client and Server Interlock" section to localio.rst
  nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst
  nfs: add Documentation/filesystems/nfs/localio.rst
  nfs: implement client support for NFS_LOCALIO_PROGRAM
  nfs/localio: use dedicated workqueues for filesystem read and write
  pnfs/flexfiles: enable localio support
  nfs: enable localio for non-pNFS IO
  nfs: add LOCALIO support
  nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit
  nfsd: implement server support for NFS_LOCALIO_PROGRAM
  nfsd: add LOCALIO support
  nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO
  nfs_common: add NFS LOCALIO auxiliary protocol enablement
  SUNRPC: replace program list with program array
  SUNRPC: add svcauth_map_clnt_to_svc_cred_local
  SUNRPC: remove call_allocate() BUG_ONs
  nfsd: add nfsd_serv_try_get and nfsd_serv_put
  nfsd: add nfsd_file_acquire_local()
  nfsd: factor out __fh_verify to allow NULL rqstp to be passed
  ...
2024-09-24 15:44:18 -07:00
Anna Schumaker 68898131d2 nfs: Fix `make htmldocs` warnings in the localio documentation
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 92945bd81c ("nfs: add Documentation/filesystems/nfs/localio.rst")
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-24 11:16:34 -04:00
Mike Snitzer 736cd2c1ae nfs: add "NFS Client and Server Interlock" section to localio.rst
This section answers a new FAQ entry:

9. How does LOCALIO make certain that object lifetimes are managed
   properly given NFSD and NFS operate in different contexts?

   See the detailed "NFS Client and Server Interlock" section below.

The first half of the section details NeilBrown's elegant design
for LOCALIO's nfs_uuid_t based interlock and is heavily based on
Neil's "net namespace refcounting" description here:
  https://marc.info/?l=linux-nfs&m=172498546024767&w=2

The second half of the section details the per-cpu-refcount introduced
to ensure NFSD's nfsd_serv isn't destroyed while in use by a LOCALIO
client.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:31 -04:00
Trond Myklebust f7128262b1 nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst
Add a FAQ section to give answers to questions that have been raised
during review of the localio feature.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:31 -04:00
Mike Snitzer 92945bd81c nfs: add Documentation/filesystems/nfs/localio.rst
This document gives an overview of the LOCALIO auxiliary RPC protocol
added to the Linux NFS client and server to allow them to reliably
handshake to determine if they are on the same host.

Once an NFS client and server handshake as "local", the client will
bypass the network RPC protocol for read, write and commit operations.
Due to this XDR and RPC bypass, these operations will operate faster.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:31 -04:00
Linus Torvalds b3f391fddf bcachefs changes for 6.12-rc1
rcu_pending, btree key cache rework: this solves lock contenting in the
 key cache, eliminating the biggest source of the srcu lock hold time
 warnings, and drastically improving performance on some metadata heavy
 workloads - on multithreaded creates we're now 3-4x faster than xfs.
 
 We're now using an rhashtable instead of the system inode hash table;
 this is another significant performance improvement on multithreaded
 metadata workloads, eliminating more lock contention.
 
 for_each_btree_key_in_subvolume_upto(): new helper for iterating over
 keys within a specific subvolume, eliminating a lot of open coded
 "subvolume_get_snapshot()" and also fixing another source of srcu lock
 time warnings, by running each loop iteration in its own transaction (as
 the existing for_each_btree_key() does).
 
 More work on btree_trans locking asserts; we now assert that we don't
 hold btree node locks when trans->locked is false, which is important
 because we don't use lockdep for tracking individual btree node locks.
 
 Some cleanups and improvements in the bset.c btree node lookup code,
 from Alan.
 
 Rework of btree node pinning, which we use in backpointers fsck. The old
 hacky implementation, where the shrinker just skipped over nodes in the
 pinned range, was causing OOMs; instead we now use another shrinker with
 a much higher seeks number for pinned nodes.
 
 Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue
 where rebalance would sometimes fall back to allocating from the full
 filesystem, which is not what we want when it's trying to move data to a
 specific target.
 
 Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache
 allocations.
 
 Idmap mounts are now supported - Hongbo.
 
 Rename whiteouts are now supported - Hongbo.
 
 Erasure coding can now handle devices being marked as failed, or
 forcibly removed. We still need the evacuate path for erasure coding,
 but it's getting very close to ready for people to start using.
 
 Status, and when will we be taking off experimental:
 ----------------------------------------------------
 
 Going by critical, user facing bugs getting found and fixed, we're
 nearly there. There are a couple key items that need to be finished
 before we can take off the experimental label:
 
 - The end-user experience is still pretty painful when the root
   filesystem needs a fsck; we need some form of limited self healing so
   that necessary repair gets run automatically. Errors (by type) are
   recorded in the superblock, so what we need to do next is convert
   remaining inconsistent() errors to fsck() errors (so that all runtime
   inconsistencies are logged in the superblock), and we need to go
   through the list of fsck errors and classify them by which fsck passes
   are needed to repair them.
 
 - We need comprehensive torture testing for all our repair paths, to
   shake out remaining bugs there. Thomas has been working on the tooling
   for this, so this is coming soonish.
 
 Slightly less critical items:
 
 - We need to improve the end-user experience for degraded mounts: right
   now, a degraded root filesystem means dropping to an initramfs shell
   or somehow inputting mount options manually (we don't want to allow
   degraded mounts without some form of user input, except on unattended
   servers) - we need the mount helper to prompt the user to allow
   mounting degraded, and make sure this works with systemd.
 
 - Scalabiity: we have users running 100TB+ filesystems, and that's
   effectively the limit right now due to fsck times. We have some
   reworks in the pipeline to address this, we're aiming to make petabyte
   sized filesystems practical.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmbvHQoACgkQE6szbY3K
 bnYfAw/+IXQ43/O+Jzs0MLD7pKZnrlbHiX9FqYLazD40vWvkyRTQOwgTn8pVNhq3
 4YWmtuZyqh036YC+bGqYFOhz20YetS5UdgbClpwmc99JJ6xsY+Z1mdpYfz5oq1Dw
 /pBX5iYb3rAt8UbQoZ8lcWM+GpT3GKJVgJuiLB2gRp9gATFesuh+0qU42oIVVVU5
 4y3VhDBUmRk4XqEnk8hr7EIDMW0wWP3aptxYMZzeUPW0x1cEQ+FWrJo5D6lXv2KK
 dKv3MogvA0FFNi/eNexclPiu2pXtI7vrxT7umsxAICHLt41rWpV5ttE6io3bC4ZN
 qvwF9w2CpmKPKchFru9PO+QrWHVR7e6bphwf3TzyoKZ7tTn42f1RQlub7gBzI3bz
 ai5ZwGRIvpUoPVBj+CO+Ipog81uUb23Ma+gXg1akEFBOAb+o7I3KOOSBh5l+0cHj
 3Ov1n0TLcsoO2cqoqfsV2QubW9YcWEZ76g5mKwQnUn8Cs6Fp0wWaIyK9aNkIAxcr
 tNDPGtH1gKitxUvju5i/LyI7y1UoeFvqJFee0VsU6QnixHn1ySzhePsJt6UEnIJT
 Ia3C96Igqu2mV9FxhfGHj/qi7TGjqqkZHa8+B610cDpgf15cx7Ps2DYjkuQMFCqZ
 Q3Q1o5De9roRq5xF2hLiYJCbzJKqd5ichFsBtLQuX572ICxbICg=
 =oVCy
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs

Pull bcachefs updates from Kent Overstreet:

 - rcu_pending, btree key cache rework: this solves lock contenting in
   the key cache, eliminating the biggest source of the srcu lock hold
   time warnings, and drastically improving performance on some metadata
   heavy workloads - on multithreaded creates we're now 3-4x faster than
   xfs.

 - We're now using an rhashtable instead of the system inode hash table;
   this is another significant performance improvement on multithreaded
   metadata workloads, eliminating more lock contention.

 - for_each_btree_key_in_subvolume_upto(): new helper for iterating over
   keys within a specific subvolume, eliminating a lot of open coded
   "subvolume_get_snapshot()" and also fixing another source of srcu
   lock time warnings, by running each loop iteration in its own
   transaction (as the existing for_each_btree_key() does).

 - More work on btree_trans locking asserts; we now assert that we don't
   hold btree node locks when trans->locked is false, which is important
   because we don't use lockdep for tracking individual btree node
   locks.

 - Some cleanups and improvements in the bset.c btree node lookup code,
   from Alan.

 - Rework of btree node pinning, which we use in backpointers fsck. The
   old hacky implementation, where the shrinker just skipped over nodes
   in the pinned range, was causing OOMs; instead we now use another
   shrinker with a much higher seeks number for pinned nodes.

 - Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue
   where rebalance would sometimes fall back to allocating from the full
   filesystem, which is not what we want when it's trying to move data
   to a specific target.

 - Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache
   allocations.

 - Idmap mounts are now supported (Hongbo Li)

 - Rename whiteouts are now supported (Hongbo Li)

 - Erasure coding can now handle devices being marked as failed, or
   forcibly removed. We still need the evacuate path for erasure coding,
   but it's getting very close to ready for people to start using.

* tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs: (99 commits)
  bcachefs: return err ptr instead of null in read sb clean
  bcachefs: Remove duplicated include in backpointers.c
  bcachefs: Don't drop devices with stripe pointers
  bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices
  bcachefs: bch_fs.rw_devs_change_count
  bcachefs: bch2_dev_remove_stripes()
  bcachefs: bch2_trigger_ptr() calculates sectors even when no device
  bcachefs: improve error messages in bch2_ec_read_extent()
  bcachefs: improve error message on too few devices for ec
  bcachefs: improve bch2_new_stripe_to_text()
  bcachefs: ec_stripe_head.nr_created
  bcachefs: bch_stripe.disk_label
  bcachefs: stripe_to_mem()
  bcachefs: EIO errcode cleanup
  bcachefs: Rework btree node pinning
  bcachefs: split up btree cache counters for live, freeable
  bcachefs: btree cache counters should be size_t
  bcachefs: Don't count "skipped access bit" as touched in btree cache scan
  bcachefs: Failed devices no longer require mounting in degraded mode
  bcachefs: bch2_dev_rcu_noerror()
  ...
2024-09-23 10:05:41 -07:00
Linus Torvalds 617a814f14 ALong with the usual shower of singleton patches, notable patch series in
this pull request are:
 
 "Align kvrealloc() with krealloc()" from Danilo Krummrich.  Adds
 consistency to the APIs and behaviour of these two core allocation
 functions.  This also simplifies/enables Rustification.
 
 "Some cleanups for shmem" from Baolin Wang.  No functional changes - mode
 code reuse, better function naming, logic simplifications.
 
 "mm: some small page fault cleanups" from Josef Bacik.  No functional
 changes - code cleanups only.
 
 "Various memory tiering fixes" from Zi Yan.  A small fix and a little
 cleanup.
 
 "mm/swap: remove boilerplate" from Yu Zhao.  Code cleanups and
 simplifications and .text shrinkage.
 
 "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt.  This
 is a feature, it adds new feilds to /proc/vmstat such as
 
     $ grep kstack /proc/vmstat
     kstack_1k 3
     kstack_2k 188
     kstack_4k 11391
     kstack_8k 243
     kstack_16k 0
 
 which tells us that 11391 processes used 4k of stack while none at all
 used 16k.  Useful for some system tuning things, but partivularly useful
 for "the dynamic kernel stack project".
 
 "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov.
 Teaches kmemleak to detect leaksage of percpu memory.
 
 "mm: memcg: page counters optimizations" from Roman Gushchin.  "3
 independent small optimizations of page counters".
 
 "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David
 Hildenbrand.  Improves PTE/PMD splitlock detection, makes powerpc/8xx work
 correctly by design rather than by accident.
 
 "mm: remove arch_make_page_accessible()" from David Hildenbrand.  Some
 folio conversions which make arch_make_page_accessible() unneeded.
 
 "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel.
 Cleans up and fixes our handling of the resetting of the cgroup/process
 peak-memory-use detector.
 
 "Make core VMA operations internal and testable" from Lorenzo Stoakes.
 Rationalizaion and encapsulation of the VMA manipulation APIs.  With a
 view to better enable testing of the VMA functions, even from a
 userspace-only harness.
 
 "mm: zswap: fixes for global shrinker" from Takero Funaki.  Fix issues in
 the zswap global shrinker, resulting in improved performance.
 
 "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao.  Fill in
 some missing info in /proc/zoneinfo.
 
 "mm: replace follow_page() by folio_walk" from David Hildenbrand.  Code
 cleanups and rationalizations (conversion to folio_walk()) resulting in
 the removal of follow_page().
 
 "improving dynamic zswap shrinker protection scheme" from Nhat Pham.  Some
 tuning to improve zswap's dynamic shrinker.  Significant reductions in
 swapin and improvements in performance are shown.
 
 "mm: Fix several issues with unaccepted memory" from Kirill Shutemov.
 Improvements to the new unaccepted memory feature,
 
 "mm/mprotect: Fix dax puds" from Peter Xu.  Implements mprotect on DAX
 PUDs.  This was missing, although nobody seems to have notied yet.
 
 "Introduce a store type enum for the Maple tree" from Sidhartha Kumar.
 Cleanups and modest performance improvements for the maple tree library
 code.
 
 "memcg: further decouple v1 code from v2" from Shakeel Butt.  Move more
 cgroup v1 remnants away from the v2 memcg code.
 
 "memcg: initiate deprecation of v1 features" from Shakeel Butt.  Adds
 various warnings telling users that memcg v1 features are deprecated.
 
 "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li.
 Greatly improves the success rate of the mTHP swap allocation.
 
 "mm: introduce numa_memblks" from Mike Rapoport.  Moves various disparate
 per-arch implementations of numa_memblk code into generic code.
 
 "mm: batch free swaps for zap_pte_range()" from Barry Song.  Greatly
 improves the performance of munmap() of swap-filled ptes.
 
 "support large folio swap-out and swap-in for shmem" from Baolin Wang.
 With this series we no longer split shmem large folios into simgle-page
 folios when swapping out shmem.
 
 "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao.  Nice performance
 improvements and code reductions for gigantic folios.
 
 "support shmem mTHP collapse" from Baolin Wang.  Adds support for
 khugepaged's collapsing of shmem mTHP folios.
 
 "mm: Optimize mseal checks" from Pedro Falcato.  Fixes an mprotect()
 performance regression due to the addition of mseal().
 
 "Increase the number of bits available in page_type" from Matthew Wilcox.
 Increases the number of bits available in page_type!
 
 "Simplify the page flags a little" from Matthew Wilcox.  Many legacy page
 flags are now folio flags, so the page-based flags and their
 accessors/mutators can be removed.
 
 "mm: store zero pages to be swapped out in a bitmap" from Usama Arif.  An
 optimization which permits us to avoid writing/reading zero-filled zswap
 pages to backing store.
 
 "Avoid MAP_FIXED gap exposure" from Liam Howlett.  Fixes a race window
 which occurs when a MAP_FIXED operqtion is occurring during an unrelated
 vma tree walk.
 
 "mm: remove vma_merge()" from Lorenzo Stoakes.  Major rotorooting of the
 vma_merge() functionality, making ot cleaner, more testable and better
 tested.
 
 "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.  Minor
 fixups of DAMON selftests and kunit tests.
 
 "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.  Code
 cleanups and folio conversions.
 
 "Shmem mTHP controls and stats improvements" from Ryan Roberts.  Cleanups
 for shmem controls and stats.
 
 "mm: count the number of anonymous THPs per size" from Barry Song.  Expose
 additional anon THP stats to userspace for improved tuning.
 
 "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio
 conversions and removal of now-unused page-based APIs.
 
 "replace per-quota region priorities histogram buffer with per-context
 one" from SeongJae Park.  DAMON histogram rationalization.
 
 "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae
 Park.  DAMON documentation updates.
 
 "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve
 related doc and warn" from Jason Wang: fixes usage of page allocator
 __GFP_NOFAIL and GFP_ATOMIC flags.
 
 "mm: split underused THPs" from Yu Zhao.  Improve THP=always policy - this
 was overprovisioning THPs in sparsely accessed memory areas.
 
 "zram: introduce custom comp backends API" frm Sergey Senozhatsky.  Add
 support for zram run-time compression algorithm tuning.
 
 "mm: Care about shadow stack guard gap when getting an unmapped area" from
 Mark Brown.  Fix up the various arch_get_unmapped_area() implementations
 to better respect guard areas.
 
 "Improve mem_cgroup_iter()" from Kinsey Ho.  Improve the reliability of
 mem_cgroup_iter() and various code cleanups.
 
 "mm: Support huge pfnmaps" from Peter Xu.  Extends the usage of huge
 pfnmap support.
 
 "resource: Fix region_intersects() vs add_memory_driver_managed()" from
 Huang Ying.  Fix a bug in region_intersects() for systems with CXL memory.
 
 "mm: hwpoison: two more poison recovery" from Kefeng Wang.  Teaches a
 couple more code paths to correctly recover from the encountering of
 poisoned memry.
 
 "mm: enable large folios swap-in support" from Barry Song.  Support the
 swapin of mTHP memory into appropriately-sized folios, rather than into
 single-page folios.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA
 jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7
 AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo=
 =s0T+
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Along with the usual shower of singleton patches, notable patch series
  in this pull request are:

   - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
     consistency to the APIs and behaviour of these two core allocation
     functions. This also simplifies/enables Rustification.

   - "Some cleanups for shmem" from Baolin Wang. No functional changes -
     mode code reuse, better function naming, logic simplifications.

   - "mm: some small page fault cleanups" from Josef Bacik. No
     functional changes - code cleanups only.

   - "Various memory tiering fixes" from Zi Yan. A small fix and a
     little cleanup.

   - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
     simplifications and .text shrinkage.

   - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
     Butt. This is a feature, it adds new feilds to /proc/vmstat such as

       $ grep kstack /proc/vmstat
       kstack_1k 3
       kstack_2k 188
       kstack_4k 11391
       kstack_8k 243
       kstack_16k 0

     which tells us that 11391 processes used 4k of stack while none at
     all used 16k. Useful for some system tuning things, but
     partivularly useful for "the dynamic kernel stack project".

   - "kmemleak: support for percpu memory leak detect" from Pavel
     Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.

   - "mm: memcg: page counters optimizations" from Roman Gushchin. "3
     independent small optimizations of page counters".

   - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
     David Hildenbrand. Improves PTE/PMD splitlock detection, makes
     powerpc/8xx work correctly by design rather than by accident.

   - "mm: remove arch_make_page_accessible()" from David Hildenbrand.
     Some folio conversions which make arch_make_page_accessible()
     unneeded.

   - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
     Finkel. Cleans up and fixes our handling of the resetting of the
     cgroup/process peak-memory-use detector.

   - "Make core VMA operations internal and testable" from Lorenzo
     Stoakes. Rationalizaion and encapsulation of the VMA manipulation
     APIs. With a view to better enable testing of the VMA functions,
     even from a userspace-only harness.

   - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
     issues in the zswap global shrinker, resulting in improved
     performance.

   - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
     in some missing info in /proc/zoneinfo.

   - "mm: replace follow_page() by folio_walk" from David Hildenbrand.
     Code cleanups and rationalizations (conversion to folio_walk())
     resulting in the removal of follow_page().

   - "improving dynamic zswap shrinker protection scheme" from Nhat
     Pham. Some tuning to improve zswap's dynamic shrinker. Significant
     reductions in swapin and improvements in performance are shown.

   - "mm: Fix several issues with unaccepted memory" from Kirill
     Shutemov. Improvements to the new unaccepted memory feature,

   - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
     DAX PUDs. This was missing, although nobody seems to have notied
     yet.

   - "Introduce a store type enum for the Maple tree" from Sidhartha
     Kumar. Cleanups and modest performance improvements for the maple
     tree library code.

   - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
     more cgroup v1 remnants away from the v2 memcg code.

   - "memcg: initiate deprecation of v1 features" from Shakeel Butt.
     Adds various warnings telling users that memcg v1 features are
     deprecated.

   - "mm: swap: mTHP swap allocator base on swap cluster order" from
     Chris Li. Greatly improves the success rate of the mTHP swap
     allocation.

   - "mm: introduce numa_memblks" from Mike Rapoport. Moves various
     disparate per-arch implementations of numa_memblk code into generic
     code.

   - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
     improves the performance of munmap() of swap-filled ptes.

   - "support large folio swap-out and swap-in for shmem" from Baolin
     Wang. With this series we no longer split shmem large folios into
     simgle-page folios when swapping out shmem.

   - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
     performance improvements and code reductions for gigantic folios.

   - "support shmem mTHP collapse" from Baolin Wang. Adds support for
     khugepaged's collapsing of shmem mTHP folios.

   - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
     performance regression due to the addition of mseal().

   - "Increase the number of bits available in page_type" from Matthew
     Wilcox. Increases the number of bits available in page_type!

   - "Simplify the page flags a little" from Matthew Wilcox. Many legacy
     page flags are now folio flags, so the page-based flags and their
     accessors/mutators can be removed.

   - "mm: store zero pages to be swapped out in a bitmap" from Usama
     Arif. An optimization which permits us to avoid writing/reading
     zero-filled zswap pages to backing store.

   - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
     window which occurs when a MAP_FIXED operqtion is occurring during
     an unrelated vma tree walk.

   - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
     the vma_merge() functionality, making ot cleaner, more testable and
     better tested.

   - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
     Minor fixups of DAMON selftests and kunit tests.

   - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
     Code cleanups and folio conversions.

   - "Shmem mTHP controls and stats improvements" from Ryan Roberts.
     Cleanups for shmem controls and stats.

   - "mm: count the number of anonymous THPs per size" from Barry Song.
     Expose additional anon THP stats to userspace for improved tuning.

   - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
     folio conversions and removal of now-unused page-based APIs.

   - "replace per-quota region priorities histogram buffer with
     per-context one" from SeongJae Park. DAMON histogram
     rationalization.

   - "Docs/damon: update GitHub repo URLs and maintainer-profile" from
     SeongJae Park. DAMON documentation updates.

   - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
     improve related doc and warn" from Jason Wang: fixes usage of page
     allocator __GFP_NOFAIL and GFP_ATOMIC flags.

   - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
     This was overprovisioning THPs in sparsely accessed memory areas.

   - "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
     Add support for zram run-time compression algorithm tuning.

   - "mm: Care about shadow stack guard gap when getting an unmapped
     area" from Mark Brown. Fix up the various arch_get_unmapped_area()
     implementations to better respect guard areas.

   - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
     of mem_cgroup_iter() and various code cleanups.

   - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
     pfnmap support.

   - "resource: Fix region_intersects() vs add_memory_driver_managed()"
     from Huang Ying. Fix a bug in region_intersects() for systems with
     CXL memory.

   - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
     a couple more code paths to correctly recover from the encountering
     of poisoned memry.

   - "mm: enable large folios swap-in support" from Barry Song. Support
     the swapin of mTHP memory into appropriately-sized folios, rather
     than into single-page folios"

* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
  zram: free secondary algorithms names
  uprobes: turn xol_area->pages[2] into xol_area->page
  uprobes: introduce the global struct vm_special_mapping xol_mapping
  Revert "uprobes: use vm_special_mapping close() functionality"
  mm: support large folios swap-in for sync io devices
  mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
  mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
  mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
  set_memory: add __must_check to generic stubs
  mm/vma: return the exact errno in vms_gather_munmap_vmas()
  memcg: cleanup with !CONFIG_MEMCG_V1
  mm/show_mem.c: report alloc tags in human readable units
  mm: support poison recovery from copy_present_page()
  mm: support poison recovery from do_cow_fault()
  resource, kunit: add test case for region_intersects()
  resource: make alloc_free_mem_region() works for iomem_resource
  mm: z3fold: deprecate CONFIG_Z3FOLD
  vfio/pci: implement huge_fault support
  mm/arm64: support large pfn mappings
  mm/x86: support large pfn mappings
  ...
2024-09-21 07:29:05 -07:00
Linus Torvalds 171754c380 vfs-6.12.blocksize
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEvwAKCRCRxhvAZXjc
 ohg3APwJWQnqFlBddcRl4yrPJ/cgcYSYAOdHb+E+blomSwdxcwEAmwsnLPNQOtw2
 rxKvQfZqhVT437bl7RpPPZrHGxwTng8=
 =6v1r
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull vfs blocksize updates from Christian Brauner:
 "This contains the vfs infrastructure as well as the xfs bits to enable
  support for block sizes (bs) larger than page sizes (ps) plus a few
  fixes to related infrastructure.

  There has been efforts over the last 16 years to enable enable Large
  Block Sizes (LBS), that is block sizes in filesystems where bs > page
  size. Through these efforts we have learned that one of the main
  blockers to supporting bs > ps in filesystems has been a way to
  allocate pages that are at least the filesystem block size on the page
  cache where bs > ps.

  Thanks to various previous efforts it is possible to support bs > ps
  in XFS with only a few changes in XFS itself. Most changes are to the
  page cache to support minimum order folio support for the target block
  size on the filesystem.

  A motivation for Large Block Sizes today is to support high-capacity
  (large amount of Terabytes) QLC SSDs where the internal Indirection
  Unit (IU) are typically greater than 4k to help reduce DRAM and so in
  turn cost and space. In practice this then allows different
  architectures to use a base page size of 4k while still enabling
  support for block sizes aligned to the larger IUs by relying on high
  order folios on the page cache when needed.

  It also allows to take advantage of the drive's support for atomics
  larger than 4k with buffered IO support in Linux. As described this
  year at LSFMM, supporting large atomics greater than 4k enables
  databases to remove the need to rely on their own journaling, so they
  can disable double buffered writes, which is a feature different cloud
  providers are already enabling through custom storage solutions"

* tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
  Documentation: iomap: fix a typo
  iomap: remove the iomap_file_buffered_write_punch_delalloc return value
  iomap: pass the iomap to the punch callback
  iomap: pass flags to iomap_file_buffered_write_punch_delalloc
  iomap: improve shared block detection in iomap_unshare_iter
  iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release
  docs:filesystems: fix spelling and grammar mistakes in iomap design page
  filemap: fix htmldoc warning for mapping_align_index()
  iomap: make zero range flush conditional on unwritten mappings
  iomap: fix handling of dirty folios over unwritten extents
  iomap: add a private argument for iomap_file_buffered_write
  iomap: remove set_memor_ro() on zero page
  xfs: enable block size larger than page size support
  xfs: make the calculation generic in xfs_sb_validate_fsb_count()
  xfs: expose block size in stat
  xfs: use kvmalloc for xattr buffers
  iomap: fix iomap_dio_zero() for fs bs > system page size
  filemap: cap PTE range to be created to allowed zero fill in folio_map_range()
  mm: split a folio in minimum folio order chunks
  readahead: allocate folios with mapping_min_order in readahead
  ...
2024-09-20 17:53:17 -07:00
Linus Torvalds 45d986d113 overlayfs updates for 6.12
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmbodTUACgkQEVvVyTe/
 1Wp5Jw/+LN99lGoKXKZ7zB8gta4hMm2O+wd3//IdfTi6u59uff0QvVNbczC6SRlJ
 sW6czIgjKpR1xt6Orp0Rof0VZK09PMqq3Iw7Tt3xEkJtjKxppfMCQHeoi03CH9kF
 gOHPW+Ad4jlCk+TYFEuHk5lQIgVtJ11JDtGjr5tNa03MbrLIa0pFpTdz65bg9uTn
 JdPqA+fiJ/mVoXlFQEZkQPeT8CgMfTbKtQDGfcmKzrNo0ZyQdW30u5lP3MGROptn
 X/QX5n77UrE9Du+UHPSaVDRUd8iA/ZcZe/7RAlKXWElHpKvAiqv84Sz0Mc1fczHx
 nQHaUfQixGs/vakkPXxVqKn/Xvnj4GDwUqyM9mG8kN1x4QpdyFRqM66zZxtWsSi5
 /PL0zwAY950a5fY4VsyRsSf6Zl/SlDdB+PVVajk6BKqMxWgvxyuVbFRUqfesHsfA
 V5GHil7F03sPzO12SvAJg8HuresCiafLdAPJgGXuGoeprmeSP9eYx0EqcUzLpHd7
 ORUF4qEylYFp8vZBJdBnZpSGRUv/mmjDDowsKeKyfgKyVR99lLKLCtqMJ0e0eGaE
 mpGNg+eq//snJldvBUHrMYQvyVm18/bbOzT0ZdGPp2bMUxLjSbiab7D3S9GKRxRt
 8GRg/bJfZw5tz2i1bxbe5IWfNij5pDilLeAQUsoO8pUwu/SrCy8=
 =wnbj
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs updates from Amir Goldstein:

 - Increase robustness of overlayfs to crashes in the case of underlying
   filesystems that to not guarantee metadata ordering to persistent
   storage (problem was reported with ubifs).

 - Deny mount inside container with features that require root
   privileges to work properly, instead of failing operations later.

 - Some clarifications to overlayfs documentation.

* tag 'ovl-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: fail if trusted xattrs are needed but caller lacks permission
  overlayfs.rst: update metacopy section in overlayfs documentation
  ovl: fsync after metadata copy-up
  ovl: don't set the superblock's errseq_t manually
2024-09-19 06:33:18 +02:00
Linus Torvalds d58db3f3a0 Another relatively mundane cycle for docs:
- The beginning of an EEVDF scheduler document
 
 - More Chinese translations
 
 - A rethrashing of our bisection documentation
 
 ...plus the usual array of smaller fixes, and more than the usual number of
 typo fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmboMnkACgkQF0NaE2wM
 flha/Qf/e8zRinIYQJ7BmombNm39w3wUiNuXr8SWq7afqhsAJJzmOZ3oyyfssL+B
 a1pSjhxb15UrKf1kMKhdBxhDndXvto5UekJRBY5gsTvcBMBmtIovN+ZK5Z5jObsw
 gzHD9of08Ti7N4C2dSBdLPHtvIBX0rVeEK4oAH7AUaQviu1cfTaLQQA0dRYsaJeX
 iXsts2NkGl6ZUF7mk4nlzj8+Y1zot+mCd6B53iSimNKxwsPODrCZUobJAvxg1qVU
 pRCQcnpx2fTBnh4ugrcLZbautyhL9bJ8VQzFeoQgYpODDgDnZyTjN6kxv65LpxAz
 dXi+hx5Vk7lP3BbTp9EeGn305/qQPA==
 =JuBw
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.12' of git://git.lwn.net/linux

Pull documentation update from Jonathan Corbet:
 "Another relatively mundane cycle for docs:

   - The beginning of an EEVDF scheduler document

   - More Chinese translations

   - A rethrashing of our bisection documentation

  ...plus the usual array of smaller fixes, and more than the usual
  number of typo fixes"

* tag 'docs-6.12' of git://git.lwn.net/linux: (48 commits)
  Remove duplicate "and" in 'Linux NVMe docs.
  docs:filesystems: fix spelling and grammar mistakes
  docs:filesystem: fix mispelled words on autofs page
  docs:mm: fixed spelling and grammar mistakes on vmalloc kernel stack page
  Documentation: PCI: fix typo in pci.rst
  docs/zh_CN: add the translation of kbuild/gcc-plugins.rst
  docs/process: fix typos
  docs:mm: fix spelling mistakes in heterogeneous memory management page
  accel/qaic: Fix a typo
  docs/zh_CN: update the translation of security-bugs
  docs: block: Fix grammar and spelling mistakes in bfq-iosched.rst
  Documentation: Fix spelling mistakes
  Documentation/gpu: Fix typo in Documentation/gpu/komeda-kms.rst
  scripts: sphinx-pre-install: remove unnecessary double check for $cur_version
  Loongarch: KVM: Add KVM hypercalls documentation for LoongArch
  Documentation: Document the kernel flag bdev_allow_write_mounted
  docs: scheduler: completion: Update member of struct completion
  docs: kerneldoc-preamble.sty: Suppress extra spaces in CJK literal blocks
  docs: submitting-patches: Advertise b4
  docs: update dev-tools/kcsan.rst url about KTSAN
  ...
2024-09-17 16:44:08 +02:00
Linus Torvalds a430d95c5e lsm/stable-6.12 PR 20240911
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmbiGGAUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPU8BAA1+A15pmS34I9pq7c8TmRz3rNEs/a
 zrW1aWJ0X/+axNS7sW3Pwtt1EKuaOhskKU8gNSieRhljC8rgXIVjZzLw6Atgcr5k
 upulGbU9TXyVisYN+PWv9/84ito6/nYsKb7Mg3nUVsdodtIFVnsk1fxYLPHQEBig
 Pl3i26U3VqH93Kz0W5vs/QR2uduPB8ZyscdTgcbrY9Vv1Y7IDZ2g9QsJVKLvbQKL
 qcPK1JkHa+sBPJxDqS9A40zgbLbdPQgWQzsXX3dz822w1Ga7FIHSqxMBA6HwHZ+L
 kV4P58wVfavhwt/cQSKMWI/yiGPMMd0B6yD+m8ojOvGfOfRCWxGMmEMqHNuZ3m7k
 Bfll5ZgZTY8phUUhiNf3nxO3F3MM/5bHdhPOj3RReqbAbS6uWr4/fThPDYY/zIo6
 NCY3HGxx3Ae64uQ01gC2p/czC50jDsMwlbXiZbrgdBhjBm/CVk5ozb80mLVcGrLB
 +6XMzzSbC8IaNAH2fDmUJ2ABdwyNPgsSOTGZVzIanpxu1SU2/yk3SMxkp8fv5s36
 wLeODUVcLgsjVV538Mkm6PGTE4TlXaH9yi6apMyJAGp0vPYx5c3Xxk2y5A5cur5p
 hcrbDiX2QgeqFbwsz36incmPmbef2NU2c8feR8XLtPJuwNIeRcMSje0pnkaFlRmb
 TAUJ1sDQAzZ8Fy0=
 =HIAO
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

 - Move the LSM framework to static calls

   This transitions the vast majority of the LSM callbacks into static
   calls. Those callbacks which haven't been converted were left as-is
   due to the general ugliness of the changes required to support the
   static call conversion; we can revisit those callbacks at a future
   date.

 - Add the Integrity Policy Enforcement (IPE) LSM

   This adds a new LSM, Integrity Policy Enforcement (IPE). There is
   plenty of documentation about IPE in this patches, so I'll refrain
   from going into too much detail here, but the basic motivation behind
   IPE is to provide a mechanism such that administrators can restrict
   execution to only those binaries which come from integrity protected
   storage, e.g. a dm-verity protected filesystem. You will notice that
   IPE requires additional LSM hooks in the initramfs, dm-verity, and
   fs-verity code, with the associated patches carrying ACK/review tags
   from the associated maintainers. We couldn't find an obvious
   maintainer for the initramfs code, but the IPE patchset has been
   widely posted over several years.

   Both Deven Bowers and Fan Wu have contributed to IPE's development
   over the past several years, with Fan Wu agreeing to serve as the IPE
   maintainer moving forward. Once IPE is accepted into your tree, I'll
   start working with Fan to ensure he has the necessary accounts, keys,
   etc. so that he can start submitting IPE pull requests to you
   directly during the next merge window.

 - Move the lifecycle management of the LSM blobs to the LSM framework

   Management of the LSM blobs (the LSM state buffers attached to
   various kernel structs, typically via a void pointer named "security"
   or similar) has been mixed, some blobs were allocated/managed by
   individual LSMs, others were managed by the LSM framework itself.

   Starting with this pull we move management of all the LSM blobs,
   minus the XFRM blob, into the framework itself, improving consistency
   across LSMs, and reducing the amount of duplicated code across LSMs.
   Due to some additional work required to migrate the XFRM blob, it has
   been left as a todo item for a later date; from a practical
   standpoint this omission should have little impact as only SELinux
   provides a XFRM LSM implementation.

 - Fix problems with the LSM's handling of F_SETOWN

   The LSM hook for the fcntl(F_SETOWN) operation had a couple of
   problems: it was racy with itself, and it was disconnected from the
   associated DAC related logic in such a way that the LSM state could
   be updated in cases where the DAC state would not. We fix both of
   these problems by moving the security_file_set_fowner() hook into the
   same section of code where the DAC attributes are updated. Not only
   does this resolve the DAC/LSM synchronization issue, but as that code
   block is protected by a lock, it also resolve the race condition.

 - Fix potential problems with the security_inode_free() LSM hook

   Due to use of RCU to protect inodes and the placement of the LSM hook
   associated with freeing the inode, there is a bit of a challenge when
   it comes to managing any LSM state associated with an inode. The VFS
   folks are not open to relocating the LSM hook so we have to get
   creative when it comes to releasing an inode's LSM state.
   Traditionally we have used a single LSM callback within the hook that
   is triggered when the inode is "marked for death", but not actually
   released due to RCU.

   Unfortunately, this causes problems for LSMs which want to take an
   action when the inode's associated LSM state is actually released; so
   we add an additional LSM callback, inode_free_security_rcu(), that is
   called when the inode's LSM state is released in the RCU free
   callback.

 - Refactor two LSM hooks to better fit the LSM return value patterns

   The vast majority of the LSM hooks follow the "return 0 on success,
   negative values on failure" pattern, however, there are a small
   handful that have unique return value behaviors which has caused
   confusion in the past and makes it difficult for the BPF verifier to
   properly vet BPF LSM programs. This includes patches to
   convert two of these"special" LSM hooks to the common 0/-ERRNO pattern.

 - Various cleanups and improvements

   A handful of patches to remove redundant code, better leverage the
   IS_ERR_OR_NULL() helper, add missing "static" markings, and do some
   minor style fixups.

* tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (40 commits)
  security: Update file_set_fowner documentation
  fs: Fix file_set_fowner LSM hook inconsistencies
  lsm: Use IS_ERR_OR_NULL() helper function
  lsm: remove LSM_COUNT and LSM_CONFIG_COUNT
  ipe: Remove duplicated include in ipe.c
  lsm: replace indirect LSM hook calls with static calls
  lsm: count the LSMs enabled at compile time
  kernel: Add helper macros for loop unrolling
  init/main.c: Initialize early LSMs after arch code, static keys and calls.
  MAINTAINERS: add IPE entry with Fan Wu as maintainer
  documentation: add IPE documentation
  ipe: kunit test for parser
  scripts: add boot policy generation program
  ipe: enable support for fs-verity as a trust provider
  fsverity: expose verified fsverity built-in signatures to LSMs
  lsm: add security_inode_setintegrity() hook
  ipe: add support for dm-verity as a trust provider
  dm-verity: expose root hash digest and signature data to LSMs
  block,lsm: add LSM blob and new LSM hooks for block devices
  ipe: add permissive toggle
  ...
2024-09-16 18:19:47 +02:00
Linus Torvalds 35219bc5c7 vfs-6.12.netfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEvgAKCRCRxhvAZXjc
 onQWAQD6IxAKPU0zom2FoWNilvSzPs7WglTtvddX9pu/lT1RNAD/YC/wOLW8mvAv
 9oTAmigQDQQhEWdJA9RgLZBiw7k+DAw=
 =zWFb
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull netfs updates from Christian Brauner:
 "This contains the work to improve read/write performance for the new
  netfs library.

  The main performance enhancing changes are:

   - Define a structure, struct folio_queue, and a new iterator type,
     ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See
     that patch for questions about naming and form.

     ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The
     problem with an xarray is that accessing it requires the use of a
     lock (typically the RCU read lock) - and this means that we can't
     supply iterate_and_advance() with a step function that might sleep
     (crypto for example) without having to drop the lock between pages.
     ITER_FOLIOQ is the iterator for a chain of folio_queue structs,
     where each folio_queue holds a small list of folios. A folio_queue
     struct is a simpler structure than xarray and is not subject to
     concurrent manipulation by the VM. folio_queue is used rather than
     a bvec[] as it can form lists of indefinite size, adding to one end
     and removing from the other on the fly.

   - Provide a copy_folio_from_iter() wrapper.

   - Make cifs RDMA support ITER_FOLIOQ.

   - Use folio queues in the write-side helpers instead of xarrays.

   - Add a function to reset the iterator in a subrequest.

   - Simplify the write-side helpers to use sheaves to skip gaps rather
     than trying to work out where gaps are.

   - In afs, make the read subrequests asynchronous, putting them into
     work items to allow the next patch to do progressive
     unlocking/reading.

   - Overhaul the read-side helpers to improve performance.

   - Fix the caching of a partial block at the end of a file.

   - Allow a store to be cancelled.

  Then some changes for cifs to make it use folio queues instead of
  xarrays for crypto bufferage:

   - Use raw iteration functions rather than manually coding iteration
     when hashing data.

   - Switch to using folio_queue for crypto buffers.

   - Remove the xarray bits.

  Make some adjustments to the /proc/fs/netfs/stats file such that:

   - All the netfs stats lines begin 'Netfs:' but change this to
     something a bit more useful.

   - Add a couple of stats counters to track the numbers of skips and
     waits on the per-inode writeback serialisation lock to make it
     easier to check for this as a source of performance loss.

  Miscellaneous work:

   - Ensure that the sb_writers lock is taken around
     vfs_{set,remove}xattr() in the cachefiles code.

   - Reduce the number of conditional branches in netfs_perform_write().

   - Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and
     remove cifs_post_modify().

   - Move the max_len/max_nr_segs members from netfs_io_subrequest to
     netfs_io_request as they're only needed for one subreq at a time.

   - Add an 'unknown' source value for tracing purposes.

   - Remove NETFS_COPY_TO_CACHE as it's no longer used.

   - Set the request work function up front at allocation time.

   - Use bh-disabling spinlocks for rreq->lock as cachefiles completion
     may be run from block-filesystem DIO completion in softirq context.

   - Remove fs/netfs/io.c"

* tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  docs: filesystems: corrected grammar of netfs page
  cifs: Don't support ITER_XARRAY
  cifs: Switch crypto buffer to use a folio_queue rather than an xarray
  cifs: Use iterate_and_advance*() routines directly for hashing
  netfs: Cancel dirty folios that have no storage destination
  cachefiles, netfs: Fix write to partial block at EOF
  netfs: Remove fs/netfs/io.c
  netfs: Speed up buffered reading
  afs: Make read subreqs async
  netfs: Simplify the writeback code
  netfs: Provide an iterator-reset function
  netfs: Use new folio_queue data type and iterator instead of xarray iter
  cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs
  iov_iter: Provide copy_folio_from_iter()
  mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios
  netfs: Use bh-disabling spinlocks for rreq->lock
  netfs: Set the request work function upon allocation
  netfs: Remove NETFS_COPY_TO_CACHE
  netfs: Reserve netfs_sreq_source 0 as unset/unknown
  netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
  ...
2024-09-16 12:13:31 +02:00
Linus Torvalds 2775df6e5e vfs-6.12.folio
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEvgAKCRCRxhvAZXjc
 ou77AQD3U1KjbdgzbUi6kaUmiiWOPhfYTlm8mho8dBjqvTCB+AD/XTWSFCWWhHB4
 KyQZTbjRD81xmVNbKjASazp0EA6Ahwc=
 =gIsD
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12.folio' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull vfs folio updates from Christian Brauner:
 "This contains work to port write_begin and write_end to rely on folios
  for various filesystems.

  This converts ocfs2, vboxfs, orangefs, jffs2, hostfs, fuse, f2fs,
  ecryptfs, ntfs3, nilfs2, reiserfs, minixfs, qnx6, sysv, ufs, and
  squashfs.

  After this series lands a bunch of the filesystems in this list do not
  mention struct page anymore"

* tag 'vfs-6.12.folio' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (61 commits)
  Squashfs: Ensure all readahead pages have been used
  Squashfs: Rewrite and update squashfs_readahead_fragment() to not use page->index
  Squashfs: Update squashfs_readpage_block() to not use page->index
  Squashfs: Update squashfs_readahead() to not use page->index
  Squashfs: Update page_actor to not use page->index
  jffs2: Use a folio in jffs2_garbage_collect_dnode()
  jffs2: Convert jffs2_do_readpage_nolock to take a folio
  buffer: Convert __block_write_begin() to take a folio
  ocfs2: Convert ocfs2_write_zero_page to use a folio
  fs: Convert aops->write_begin to take a folio
  fs: Convert aops->write_end to take a folio
  vboxsf: Use a folio in vboxsf_write_end()
  orangefs: Convert orangefs_write_begin() to use a folio
  orangefs: Convert orangefs_write_end() to use a folio
  jffs2: Convert jffs2_write_begin() to use a folio
  jffs2: Convert jffs2_write_end() to use a folio
  hostfs: Convert hostfs_write_end() to use a folio
  fuse: Convert fuse_write_begin() to use a folio
  fuse: Convert fuse_write_end() to use a folio
  f2fs: Convert f2fs_write_begin() to use a folio
  ...
2024-09-16 08:54:30 +02:00
Linus Torvalds 8f72c31f45 vfs-6.12.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEGwAKCRCRxhvAZXjc
 ojIuAQC433+hBkvjvmQ7H0r5rgZSjUuCTG3bSmdU7RJmPHUHhwEA85v/NGq53f+W
 IhandK6t+Cf0JYpFZ3N0bT88hDYVhQQ=
 =9zGL
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual pile of misc updates:

  Features:

   - Add F_CREATED_QUERY fcntl() that allows userspace to query whether
     a file was actually created. Often userspace wants to know whether
     an O_CREATE request did actually create a file without using
     O_EXCL. The current logic is that to first attempts to open the
     file without O_CREAT | O_EXCL and if ENOENT is returned userspace
     tries again with both flags. If that succeeds all is well. If it
     now reports EEXIST it retries.

     That works fairly well but some corner cases make this more
     involved. If this operates on a dangling symlink the first openat()
     without O_CREAT | O_EXCL will return ENOENT but the second openat()
     with O_CREAT | O_EXCL will fail with EEXIST.

     The reason is that openat() without O_CREAT | O_EXCL follows the
     symlink while O_CREAT | O_EXCL doesn't for security reasons. So
     it's not something we can really change unless we add an explicit
     opt-in via O_FOLLOW which seems really ugly.

     All available workarounds are really nasty (fanotify, bpf lsm etc)
     so add a simple fcntl().

   - Try an opportunistic lookup for O_CREAT. Today, when opening a file
     we'll typically do a fast lookup, but if O_CREAT is set, the kernel
     always takes the exclusive inode lock. This was likely done with
     the expectation that O_CREAT means that we always expect to do the
     create, but that's often not the case. Many programs set O_CREAT
     even in scenarios where the file already exists (see related
     F_CREATED_QUERY patch motivation above).

     The series contained in the pr rearranges the pathwalk-for-open
     code to also attempt a fast_lookup in certain O_CREAT cases. If a
     positive dentry is found, the inode_lock can be avoided altogether
     and it can stay in rcuwalk mode for the last step_into.

   - Expose the 64 bit mount id via name_to_handle_at()

     Now that we provide a unique 64-bit mount ID interface in statx(2),
     we can now provide a race-free way for name_to_handle_at(2) to
     provide a file handle and corresponding mount without needing to
     worry about racing with /proc/mountinfo parsing or having to open a
     file just to do statx(2).

     While this is not necessary if you are using AT_EMPTY_PATH and
     don't care about an extra statx(2) call, users that pass full paths
     into name_to_handle_at(2) need to know which mount the file handle
     comes from (to make sure they don't try to open_by_handle_at a file
     handle from a different filesystem) and switching to AT_EMPTY_PATH
     would require allocating a file for every name_to_handle_at(2) call

   - Add a per dentry expire timeout to autofs

     There are two fairly well known automounter map formats, the autofs
     format and the amd format (more or less System V and Berkley).

     Some time ago Linux autofs added an amd map format parser that
     implemented a fair amount of the amd functionality. This was done
     within the autofs infrastructure and some functionality wasn't
     implemented because it either didn't make sense or required extra
     kernel changes. The idea was to restrict changes to be within the
     existing autofs functionality as much as possible and leave changes
     with a wider scope to be considered later.

     One of these changes is implementing the amd options:
      1) "unmount", expire this mount according to a timeout (same as
         the current autofs default).
      2) "nounmount", don't expire this mount (same as setting the
         autofs timeout to 0 except only for this specific mount) .
      3) "utimeout=<seconds>", expire this mount using the specified
         timeout (again same as setting the autofs timeout but only for
         this mount)

     To implement these options per-dentry expire timeouts need to be
     implemented for autofs indirect mounts. This is because all map
     keys (mounts) for autofs indirect mounts use an expire timeout
     stored in the autofs mount super block info. structure and all
     indirect mounts use the same expire timeout.

  Fixes:

   - Fix missing fput for FSCONFIG_SET_FD in autofs

   - Use param->file for FSCONFIG_SET_FD in coda

   - Delete the 'fs/netfs' proc subtreee when netfs module exits

   - Make sure that struct uid_gid_map fits into a single cacheline

   - Don't flush in-flight wb switches for superblocks without cgroup
     writeback

   - Correcting the idmapping mount example in the idmapping
     documentation

   - Fix a race between evice_inodes() and find_inode() and iput()

   - Refine the show_inode_state() macro definition in writeback code

   - Prevent dump_mapping() from accessing invalid dentry.d_name.name

   - Show actual source for debugfs in /proc/mounts

   - Annotate data-race of busy_poll_usecs in eventpoll

   - Don't WARN for racy path_noexec check in exec code

   - Handle OOM on mnt_warn_timestamp_expiry()

   - Fix some spelling in the iomap design documentation

   - Fix typo in procfs comment

   - Fix typo in fs/namespace.c comment

  Cleanups:

   - Add the VFS git tree to the MAINTAINERS file

   - Move FMODE_UNSIGNED_OFFSET to fop_flags freeing up another f_mode
     bit in struct file bringing us to 5 free f_mode bits

   - Remove the __I_DIO_WAKEUP bit from i_state flags as we can simplify
     the wait mechanism

   - Remove the unused path_put_init() helper

   - Replace a __u32 with u32 for s_fsnotify_mask as __u32 is uapi
     specific

   - Replace the unsigned long i_state member with a u32 i_state member
     in struct inode freeing up 4 bytes in struct inode. Instead of
     using the bit based wait apis we're now using the var event apis
     and using the individual bytes of the i_state member to wait on
     state changes

   - Explain how per-syscall AT_* flags should be allocated

   - Use in_group_or_capable() helper to simplify the posix acl mode
     update code

   - Switch to LIST_HEAD() in fsync_buffers_list() to simplify the code

   - Removed comment about d_rcu_to_refcount() as that function doesn't
     exist anymore

   - Add kernel documentation for lookup_fast()

   - Don't re-zero evenpoll fields

   - Remove outdated comment after close_fd()

   - Fix imprecise wording in comment about the pipe filesystem

   - Drop GFP_NOFAIL mode from alloc_page_buffers

   - Missing blank line warnings and struct declaration improved in
     file_table

   - Annotate struct poll_list with __counted_by()

   - Remove the unused read parameter in percpu-rwsem

   - Remove linux/prefetch.h include from direct-io code

   - Use kmemdup_array instead of kmemdup for multiple allocation in
     mnt_idmapping code

   - Remove unused mnt_cursor_del() declaration

  Performance tweaks:

   - Dodge smp_mb in break_lease and break_deleg in the common case

   - Only read fops once in fops_{get,put}()

   - Use RCU in ilookup()

   - Elide smp_mb in iversion handling in the common case

   - Drop one lock trip in evict()"

* tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (58 commits)
  uidgid: make sure we fit into one cacheline
  proc: Fix typo in the comment
  fs/pipe: Correct imprecise wording in comment
  fhandle: expose u64 mount id to name_to_handle_at(2)
  uapi: explain how per-syscall AT_* flags should be allocated
  fs: drop GFP_NOFAIL mode from alloc_page_buffers
  writeback: Refine the show_inode_state() macro definition
  fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name
  mnt_idmapping: Use kmemdup_array instead of kmemdup for multiple allocation
  netfs: Delete subtree of 'fs/netfs' when netfs module exits
  fs: use LIST_HEAD() to simplify code
  inode: make i_state a u32
  inode: port __I_LRU_ISOLATING to var event
  vfs: fix race between evice_inodes() and find_inode()&iput()
  inode: port __I_NEW to var event
  inode: port __I_SYNC to var event
  fs: reorder i_state bits
  fs: add i_state helpers
  MAINTAINERS: add the VFS git tree
  fs: s/__u32/u32/ for s_fsnotify_mask
  ...
2024-09-16 08:35:09 +02:00
Pankaj Raghav 71fdfcdd0d
Documentation: iomap: fix a typo
Change voidw -> void.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Link: https://lore.kernel.org/r/20240820161329.1293718-1-kernel@pankajraghav.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12 14:07:17 +02:00
Dennis Lam 4b40d43d9f
docs: filesystems: corrected grammar of netfs page
Fixed the word "aren't" to "isn't" based on singular word "bufferage".

Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com>
Link: https://lore.kernel.org/r/20240912012550.13748-2-dennis.lamerice@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12 12:20:43 +02:00
Bagas Sanjaya 0088d7581b tools: usb: p9_fwd: wrap USBG shell command examples in literal code blocks
Stephen Rothwell reported htmldocs warning when merging usb tree:

Documentation/filesystems/9p.rst:99: ERROR: Unexpected indentation.

That's because Sphinx tries rendering p9_fwd.py output as a grid table
instead.

Wrap shell commands in "USBG Example" section in literal code blocks
to fix above warning and to be in line with rest of commands in the doc.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20240905184059.0f30ff9a@canb.auug.org.au/
Fixes: 673f0c3ffc ("tools: usb: p9_fwd: add usb gadget packet forwarder script")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Link: https://lore.kernel.org/r/20240908113423.158352-1-bagasdotme@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-11 15:33:47 +02:00
Dennis Lam 2409952f64 docs:filesystems: fix spelling and grammar mistakes
Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20240906195400.39949-1-dennis.lamerice@gmail.com>
2024-09-10 15:36:50 -06:00
Dennis Lam 0cac9253a0 docs:filesystem: fix mispelled words on autofs page
Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20240908183741.15352-2-dennis.lamerice@gmail.com>
2024-09-10 15:35:36 -06:00
Xiaxi Shen d89b35d83e bcachefs: Fix a spelling error in docs
Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09 09:41:48 -04:00
Dennis Lam b1daf3f847
docs:filesystems: fix spelling and grammar mistakes in iomap design page
Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com>
Link: https://lore.kernel.org/r/20240908172841.9616-2-dennis.lamerice@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-09 09:53:13 +02:00
Yuriy Belikov 930b7c32ea overlayfs.rst: update metacopy section in overlayfs documentation
- Provide info about trusted.overlay.metacopy extended attribute
- Minor rephrasing regarding copy-up operation with metacopy=on

Signed-off-by: Yuriy Belikov <yuriybelikov1@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-09-08 15:36:38 +02:00
Jonathan Corbet d224338aa1 Linux 6.11-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbUG7oeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG7LUH/26M4QJ5UGJHsehd
 bbHlE4or0jibFyMbUiYDOElqLITjCVH6mi3Kv3E7sfyLxSsglVRRNzLCTq/UgTf8
 E1L90q4wCySElzzIhH6cltuQdAhs7pRWs5BETByvIW+g+ayN0LZxUPbvB8yl/nOU
 Zx8flBEuM2isuRlnx+iRccbf2PxNadSkSYg2TlmZr8mfFKCiRxjU7x355Q3UcylQ
 b8S2jVgq69CSDF3IBOzwHZjdq5OceDsO8he0KcfSTvSgyFMcwhntAT397YEnFXnk
 KKjKPNCu3KqHtTxsi4Sc0wOxVcgctDv4OPethaL8yROQ7jdBTkvNpPT1yMf7bca8
 ZLpSo5Y=
 =TBcj
 -----END PGP SIGNATURE-----

Merge tag 'v6.11-rc6' into docs-mw

This is done primarily to get a docs build fix merged via another tree so
that "make htmldocs" stops failing.
2024-09-05 14:01:38 -06:00
Michael Grzeschik 673f0c3ffc tools: usb: p9_fwd: add usb gadget packet forwarder script
This patch is adding an small python tool to forward 9pfs requests
from the USB gadget to an existing 9pfs TCP server. Since currently all
9pfs servers lack support for the usb transport this tool is an useful
helper to get started.

Refer the Documentation section "USBG Example" in
Documentation/filesystems/9p.rst on how to use it.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Tested-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Link: https://lore.kernel.org/r/20240116-ml-topic-u9p-v12-3-9a27de5160e0@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-03 09:57:08 +02:00
Michael Grzeschik a3be076dc1 net/9p/usbg: Add new usb gadget function transport
Add the new gadget function for 9pfs transport. This function is
defining an simple 9pfs transport interface that consists of one in and
one out endpoint. The endpoints transmit and receive the 9pfs protocol
payload when mounting a 9p filesystem over usb.

Tested-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Link: https://lore.kernel.org/r/20240116-ml-topic-u9p-v12-2-9a27de5160e0@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-03 09:57:08 +02:00
Matthew Wilcox (Oracle) 420e05d0de fs: remove calls to set and clear the folio error flag
Nobody checks the folio error flag any more, so we can stop setting and
clearing it.  Also remove the documentation suggesting to not bother
setting the error bit.

Link: https://lkml.kernel.org/r/20240807193528.1865100-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:26:04 -07:00
Hongbo Li 3717bbcb59 doc: correcting the idmapping mount example
In step 2, we obtain the kernel id `k1000`. So in next step (step
3), we should translate the `k1000` not `k21000`.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20240816063611.1961910-1-lihongbo22@huawei.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30 08:22:37 +02:00
Xiaxi Shen 28c7658b2c Fix spelling and gramatical errors
Fixed 3 typos in design.rst

Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com>
Link: https://lore.kernel.org/r/20240807070536.14536-1-shenxiaxi26@gmail.com
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30 08:22:34 +02:00