fuse update for 6.16
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCaD2Y1wAKCRDh3BK/laaZ PFSHAP4q1+mOlQfZJPH/PFDwa+F0QW/uc3szXatS0888nxui/gEAsIeyyJlf+Mr8 /1JPXxCqcapRFw9xsS0zioiK54Elfww= =2KxA -----END PGP SIGNATURE----- Merge tag 'fuse-update-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Remove tmp page copying in writeback path (Joanne). This removes ~300 lines and with that a lot of complexity related to avoiding reclaim related deadlock. The old mechanism is replaced with a mapping flag that tells the MM not to block reclaim waiting for writeback to complete. The MM parts have been reviewed/acked by respective maintainers. - Convert more code to handle large folios (Joanne). This still just adds the code to deal with large folios and does not enable them yet. - Allow invalidating all cached lookups atomically (Luis Henriques). This feature is useful for CernVMFS, which currently does this iteratively. - Align write prefaulting in fuse with generic one (Dave Hansen) - Fix race causing invalid data to be cached when setting attributes on different nodes of a distributed fs (Guang Yuan Wu) - Update documentation for passthrough (Chen Linxuan) - Add fdinfo about the device number associated with an opened /dev/fuse instance (Chen Linxuan) - Increase readdir buffer size (Miklos). This depends on a patch to VFS readdir code that was already merged through Christians tree. - Optimize io-uring request expiration (Joanne) - Misc cleanups * tag 'fuse-update-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits) fuse: increase readdir buffer size readdir: supply dir_context.count as readdir buffer size hint fuse: don't allow signals to interrupt getdents copying fuse: support large folios for writeback fuse: support large folios for readahead fuse: support large folios for queued writes fuse: support large folios for stores fuse: support large folios for symlinks fuse: support large folios for folio reads fuse: support large folios for writethrough writes fuse: refactor fuse_fill_write_pages() fuse: support large folios for retrieves fuse: support copying large folios fs: fuse: add dev id to /dev/fuse fdinfo docs: filesystems: add fuse-passthrough.rst MAINTAINERS: update filter of FUSE documentation fuse: fix race between concurrent setattrs from multiple nodes fuse: remove tmp folio for writebacks and internal rb tree mm: skip folio reclaim in legacy memcg contexts for deadlockable mappings fuse: optimize over-io-uring request expiration check ...pull/1255/head
commit
2619a6d413
|
|
@ -0,0 +1,133 @@
|
|||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================
|
||||
FUSE Passthrough
|
||||
================
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
FUSE (Filesystem in Userspace) passthrough is a feature designed to improve the
|
||||
performance of FUSE filesystems for I/O operations. Typically, FUSE operations
|
||||
involve communication between the kernel and a userspace FUSE daemon, which can
|
||||
incur overhead. Passthrough allows certain operations on a FUSE file to bypass
|
||||
the userspace daemon and be executed directly by the kernel on an underlying
|
||||
"backing file".
|
||||
|
||||
This is achieved by the FUSE daemon registering a file descriptor (pointing to
|
||||
the backing file on a lower filesystem) with the FUSE kernel module. The kernel
|
||||
then receives an identifier (``backing_id``) for this registered backing file.
|
||||
When a FUSE file is subsequently opened, the FUSE daemon can, in its response to
|
||||
the ``OPEN`` request, include this ``backing_id`` and set the
|
||||
``FOPEN_PASSTHROUGH`` flag. This establishes a direct link for specific
|
||||
operations.
|
||||
|
||||
Currently, passthrough is supported for operations like ``read(2)``/``write(2)``
|
||||
(via ``read_iter``/``write_iter``), ``splice(2)``, and ``mmap(2)``.
|
||||
|
||||
Enabling Passthrough
|
||||
====================
|
||||
|
||||
To use FUSE passthrough:
|
||||
|
||||
1. The FUSE filesystem must be compiled with ``CONFIG_FUSE_PASSTHROUGH``
|
||||
enabled.
|
||||
2. The FUSE daemon, during the ``FUSE_INIT`` handshake, must negotiate the
|
||||
``FUSE_PASSTHROUGH`` capability and specify its desired
|
||||
``max_stack_depth``.
|
||||
3. The (privileged) FUSE daemon uses the ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl
|
||||
on its connection file descriptor (e.g., ``/dev/fuse``) to register a
|
||||
backing file descriptor and obtain a ``backing_id``.
|
||||
4. When handling an ``OPEN`` or ``CREATE`` request for a FUSE file, the daemon
|
||||
replies with the ``FOPEN_PASSTHROUGH`` flag set in
|
||||
``fuse_open_out::open_flags`` and provides the corresponding ``backing_id``
|
||||
in ``fuse_open_out::backing_id``.
|
||||
5. The FUSE daemon should eventually call ``FUSE_DEV_IOC_BACKING_CLOSE`` with
|
||||
the ``backing_id`` to release the kernel's reference to the backing file
|
||||
when it's no longer needed for passthrough setups.
|
||||
|
||||
Privilege Requirements
|
||||
======================
|
||||
|
||||
Setting up passthrough functionality currently requires the FUSE daemon to
|
||||
possess the ``CAP_SYS_ADMIN`` capability. This requirement stems from several
|
||||
security and resource management considerations that are actively being
|
||||
discussed and worked on. The primary reasons for this restriction are detailed
|
||||
below.
|
||||
|
||||
Resource Accounting and Visibility
|
||||
----------------------------------
|
||||
|
||||
The core mechanism for passthrough involves the FUSE daemon opening a file
|
||||
descriptor to a backing file and registering it with the FUSE kernel module via
|
||||
the ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl. This ioctl returns a ``backing_id``
|
||||
associated with a kernel-internal ``struct fuse_backing`` object, which holds a
|
||||
reference to the backing ``struct file``.
|
||||
|
||||
A significant concern arises because the FUSE daemon can close its own file
|
||||
descriptor to the backing file after registration. The kernel, however, will
|
||||
still hold a reference to the ``struct file`` via the ``struct fuse_backing``
|
||||
object as long as it's associated with a ``backing_id`` (or subsequently, with
|
||||
an open FUSE file in passthrough mode).
|
||||
|
||||
This behavior leads to two main issues for unprivileged FUSE daemons:
|
||||
|
||||
1. **Invisibility to lsof and other inspection tools**: Once the FUSE
|
||||
daemon closes its file descriptor, the open backing file held by the kernel
|
||||
becomes "hidden." Standard tools like ``lsof``, which typically inspect
|
||||
process file descriptor tables, would not be able to identify that this
|
||||
file is still open by the system on behalf of the FUSE filesystem. This
|
||||
makes it difficult for system administrators to track resource usage or
|
||||
debug issues related to open files (e.g., preventing unmounts).
|
||||
|
||||
2. **Bypassing RLIMIT_NOFILE**: The FUSE daemon process is subject to
|
||||
resource limits, including the maximum number of open file descriptors
|
||||
(``RLIMIT_NOFILE``). If an unprivileged daemon could register backing files
|
||||
and then close its own FDs, it could potentially cause the kernel to hold
|
||||
an unlimited number of open ``struct file`` references without these being
|
||||
accounted against the daemon's ``RLIMIT_NOFILE``. This could lead to a
|
||||
denial-of-service (DoS) by exhausting system-wide file resources.
|
||||
|
||||
The ``CAP_SYS_ADMIN`` requirement acts as a safeguard against these issues,
|
||||
restricting this powerful capability to trusted processes.
|
||||
|
||||
**NOTE**: ``io_uring`` solves this similar issue by exposing its "fixed files",
|
||||
which are visible via ``fdinfo`` and accounted under the registering user's
|
||||
``RLIMIT_NOFILE``.
|
||||
|
||||
Filesystem Stacking and Shutdown Loops
|
||||
--------------------------------------
|
||||
|
||||
Another concern relates to the potential for creating complex and problematic
|
||||
filesystem stacking scenarios if unprivileged users could set up passthrough.
|
||||
A FUSE passthrough filesystem might use a backing file that resides:
|
||||
|
||||
* On the *same* FUSE filesystem.
|
||||
* On another filesystem (like OverlayFS) which itself might have an upper or
|
||||
lower layer that is a FUSE filesystem.
|
||||
|
||||
These configurations could create dependency loops, particularly during
|
||||
filesystem shutdown or unmount sequences, leading to deadlocks or system
|
||||
instability. This is conceptually similar to the risks associated with the
|
||||
``LOOP_SET_FD`` ioctl, which also requires ``CAP_SYS_ADMIN``.
|
||||
|
||||
To mitigate this, FUSE passthrough already incorporates checks based on
|
||||
filesystem stacking depth (``sb->s_stack_depth`` and ``fc->max_stack_depth``).
|
||||
For example, during the ``FUSE_INIT`` handshake, the FUSE daemon can negotiate
|
||||
the ``max_stack_depth`` it supports. When a backing file is registered via
|
||||
``FUSE_DEV_IOC_BACKING_OPEN``, the kernel checks if the backing file's
|
||||
filesystem stack depth is within the allowed limit.
|
||||
|
||||
The ``CAP_SYS_ADMIN`` requirement provides an additional layer of security,
|
||||
ensuring that only privileged users can create these potentially complex
|
||||
stacking arrangements.
|
||||
|
||||
General Security Posture
|
||||
------------------------
|
||||
|
||||
As a general principle for new kernel features that allow userspace to instruct
|
||||
the kernel to perform direct operations on its behalf based on user-provided
|
||||
file descriptors, starting with a higher privilege requirement (like
|
||||
``CAP_SYS_ADMIN``) is a conservative and common security practice. This allows
|
||||
the feature to be used and tested while further security implications are
|
||||
evaluated and addressed.
|
||||
|
|
@ -99,6 +99,7 @@ Documentation for filesystem implementations.
|
|||
fuse
|
||||
fuse-io
|
||||
fuse-io-uring
|
||||
fuse-passthrough
|
||||
inotify
|
||||
isofs
|
||||
nilfs2
|
||||
|
|
|
|||
|
|
@ -9846,7 +9846,7 @@ L: linux-fsdevel@vger.kernel.org
|
|||
S: Maintained
|
||||
W: https://github.com/libfuse/
|
||||
T: git git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git
|
||||
F: Documentation/filesystems/fuse.rst
|
||||
F: Documentation/filesystems/fuse*
|
||||
F: fs/fuse/
|
||||
F: include/uapi/linux/fuse.h
|
||||
|
||||
|
|
|
|||
182
fs/fuse/dev.c
182
fs/fuse/dev.c
|
|
@ -23,6 +23,7 @@
|
|||
#include <linux/swap.h>
|
||||
#include <linux/splice.h>
|
||||
#include <linux/sched.h>
|
||||
#include <linux/seq_file.h>
|
||||
|
||||
#define CREATE_TRACE_POINTS
|
||||
#include "fuse_trace.h"
|
||||
|
|
@ -45,7 +46,7 @@ bool fuse_request_expired(struct fuse_conn *fc, struct list_head *list)
|
|||
return time_is_before_jiffies(req->create_time + fc->timeout.req_timeout);
|
||||
}
|
||||
|
||||
bool fuse_fpq_processing_expired(struct fuse_conn *fc, struct list_head *processing)
|
||||
static bool fuse_fpq_processing_expired(struct fuse_conn *fc, struct list_head *processing)
|
||||
{
|
||||
int i;
|
||||
|
||||
|
|
@ -816,7 +817,7 @@ static int unlock_request(struct fuse_req *req)
|
|||
return err;
|
||||
}
|
||||
|
||||
void fuse_copy_init(struct fuse_copy_state *cs, int write,
|
||||
void fuse_copy_init(struct fuse_copy_state *cs, bool write,
|
||||
struct iov_iter *iter)
|
||||
{
|
||||
memset(cs, 0, sizeof(*cs));
|
||||
|
|
@ -955,10 +956,10 @@ static int fuse_check_folio(struct folio *folio)
|
|||
* folio that was originally in @pagep will lose a reference and the new
|
||||
* folio returned in @pagep will carry a reference.
|
||||
*/
|
||||
static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
|
||||
static int fuse_try_move_folio(struct fuse_copy_state *cs, struct folio **foliop)
|
||||
{
|
||||
int err;
|
||||
struct folio *oldfolio = page_folio(*pagep);
|
||||
struct folio *oldfolio = *foliop;
|
||||
struct folio *newfolio;
|
||||
struct pipe_buffer *buf = cs->pipebufs;
|
||||
|
||||
|
|
@ -979,7 +980,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
|
|||
cs->pipebufs++;
|
||||
cs->nr_segs--;
|
||||
|
||||
if (cs->len != PAGE_SIZE)
|
||||
if (cs->len != folio_size(oldfolio))
|
||||
goto out_fallback;
|
||||
|
||||
if (!pipe_buf_try_steal(cs->pipe, buf))
|
||||
|
|
@ -1025,7 +1026,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
|
|||
if (test_bit(FR_ABORTED, &cs->req->flags))
|
||||
err = -ENOENT;
|
||||
else
|
||||
*pagep = &newfolio->page;
|
||||
*foliop = newfolio;
|
||||
spin_unlock(&cs->req->waitq.lock);
|
||||
|
||||
if (err) {
|
||||
|
|
@ -1058,8 +1059,8 @@ out_fallback:
|
|||
goto out_put_old;
|
||||
}
|
||||
|
||||
static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
|
||||
unsigned offset, unsigned count)
|
||||
static int fuse_ref_folio(struct fuse_copy_state *cs, struct folio *folio,
|
||||
unsigned offset, unsigned count)
|
||||
{
|
||||
struct pipe_buffer *buf;
|
||||
int err;
|
||||
|
|
@ -1067,17 +1068,17 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
|
|||
if (cs->nr_segs >= cs->pipe->max_usage)
|
||||
return -EIO;
|
||||
|
||||
get_page(page);
|
||||
folio_get(folio);
|
||||
err = unlock_request(cs->req);
|
||||
if (err) {
|
||||
put_page(page);
|
||||
folio_put(folio);
|
||||
return err;
|
||||
}
|
||||
|
||||
fuse_copy_finish(cs);
|
||||
|
||||
buf = cs->pipebufs;
|
||||
buf->page = page;
|
||||
buf->page = &folio->page;
|
||||
buf->offset = offset;
|
||||
buf->len = count;
|
||||
|
||||
|
|
@ -1089,20 +1090,24 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
|
|||
}
|
||||
|
||||
/*
|
||||
* Copy a page in the request to/from the userspace buffer. Must be
|
||||
* Copy a folio in the request to/from the userspace buffer. Must be
|
||||
* done atomically
|
||||
*/
|
||||
static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
|
||||
unsigned offset, unsigned count, int zeroing)
|
||||
static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop,
|
||||
unsigned offset, unsigned count, int zeroing)
|
||||
{
|
||||
int err;
|
||||
struct page *page = *pagep;
|
||||
struct folio *folio = *foliop;
|
||||
size_t size;
|
||||
|
||||
if (page && zeroing && count < PAGE_SIZE)
|
||||
clear_highpage(page);
|
||||
if (folio) {
|
||||
size = folio_size(folio);
|
||||
if (zeroing && count < size)
|
||||
folio_zero_range(folio, 0, size);
|
||||
}
|
||||
|
||||
while (count) {
|
||||
if (cs->write && cs->pipebufs && page) {
|
||||
if (cs->write && cs->pipebufs && folio) {
|
||||
/*
|
||||
* Can't control lifetime of pipe buffers, so always
|
||||
* copy user pages.
|
||||
|
|
@ -1112,12 +1117,12 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
|
|||
if (err)
|
||||
return err;
|
||||
} else {
|
||||
return fuse_ref_page(cs, page, offset, count);
|
||||
return fuse_ref_folio(cs, folio, offset, count);
|
||||
}
|
||||
} else if (!cs->len) {
|
||||
if (cs->move_pages && page &&
|
||||
offset == 0 && count == PAGE_SIZE) {
|
||||
err = fuse_try_move_page(cs, pagep);
|
||||
if (cs->move_folios && folio &&
|
||||
offset == 0 && count == size) {
|
||||
err = fuse_try_move_folio(cs, foliop);
|
||||
if (err <= 0)
|
||||
return err;
|
||||
} else {
|
||||
|
|
@ -1126,22 +1131,30 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
|
|||
return err;
|
||||
}
|
||||
}
|
||||
if (page) {
|
||||
void *mapaddr = kmap_local_page(page);
|
||||
void *buf = mapaddr + offset;
|
||||
offset += fuse_copy_do(cs, &buf, &count);
|
||||
if (folio) {
|
||||
void *mapaddr = kmap_local_folio(folio, offset);
|
||||
void *buf = mapaddr;
|
||||
unsigned int copy = count;
|
||||
unsigned int bytes_copied;
|
||||
|
||||
if (folio_test_highmem(folio) && count > PAGE_SIZE - offset_in_page(offset))
|
||||
copy = PAGE_SIZE - offset_in_page(offset);
|
||||
|
||||
bytes_copied = fuse_copy_do(cs, &buf, ©);
|
||||
kunmap_local(mapaddr);
|
||||
offset += bytes_copied;
|
||||
count -= bytes_copied;
|
||||
} else
|
||||
offset += fuse_copy_do(cs, NULL, &count);
|
||||
}
|
||||
if (page && !cs->write)
|
||||
flush_dcache_page(page);
|
||||
if (folio && !cs->write)
|
||||
flush_dcache_folio(folio);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Copy pages in the request to/from userspace buffer */
|
||||
static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes,
|
||||
int zeroing)
|
||||
/* Copy folios in the request to/from userspace buffer */
|
||||
static int fuse_copy_folios(struct fuse_copy_state *cs, unsigned nbytes,
|
||||
int zeroing)
|
||||
{
|
||||
unsigned i;
|
||||
struct fuse_req *req = cs->req;
|
||||
|
|
@ -1151,23 +1164,12 @@ static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes,
|
|||
int err;
|
||||
unsigned int offset = ap->descs[i].offset;
|
||||
unsigned int count = min(nbytes, ap->descs[i].length);
|
||||
struct page *orig, *pagep;
|
||||
|
||||
orig = pagep = &ap->folios[i]->page;
|
||||
|
||||
err = fuse_copy_page(cs, &pagep, offset, count, zeroing);
|
||||
err = fuse_copy_folio(cs, &ap->folios[i], offset, count, zeroing);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
nbytes -= count;
|
||||
|
||||
/*
|
||||
* fuse_copy_page may have moved a page from a pipe instead of
|
||||
* copying into our given page, so update the folios if it was
|
||||
* replaced.
|
||||
*/
|
||||
if (pagep != orig)
|
||||
ap->folios[i] = page_folio(pagep);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
|
@ -1197,7 +1199,7 @@ int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
|
|||
for (i = 0; !err && i < numargs; i++) {
|
||||
struct fuse_arg *arg = &args[i];
|
||||
if (i == numargs - 1 && argpages)
|
||||
err = fuse_copy_pages(cs, arg->size, zeroing);
|
||||
err = fuse_copy_folios(cs, arg->size, zeroing);
|
||||
else
|
||||
err = fuse_copy_one(cs, arg->value, arg->size);
|
||||
}
|
||||
|
|
@ -1538,7 +1540,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, struct iov_iter *to)
|
|||
if (!user_backed_iter(to))
|
||||
return -EINVAL;
|
||||
|
||||
fuse_copy_init(&cs, 1, to);
|
||||
fuse_copy_init(&cs, true, to);
|
||||
|
||||
return fuse_dev_do_read(fud, file, &cs, iov_iter_count(to));
|
||||
}
|
||||
|
|
@ -1561,7 +1563,7 @@ static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
|
|||
if (!bufs)
|
||||
return -ENOMEM;
|
||||
|
||||
fuse_copy_init(&cs, 1, NULL);
|
||||
fuse_copy_init(&cs, true, NULL);
|
||||
cs.pipebufs = bufs;
|
||||
cs.pipe = pipe;
|
||||
ret = fuse_dev_do_read(fud, in, &cs, len);
|
||||
|
|
@ -1786,20 +1788,23 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
|
|||
num = outarg.size;
|
||||
while (num) {
|
||||
struct folio *folio;
|
||||
struct page *page;
|
||||
unsigned int this_num;
|
||||
unsigned int folio_offset;
|
||||
unsigned int nr_bytes;
|
||||
unsigned int nr_pages;
|
||||
|
||||
folio = filemap_grab_folio(mapping, index);
|
||||
err = PTR_ERR(folio);
|
||||
if (IS_ERR(folio))
|
||||
goto out_iput;
|
||||
|
||||
page = &folio->page;
|
||||
this_num = min_t(unsigned, num, folio_size(folio) - offset);
|
||||
err = fuse_copy_page(cs, &page, offset, this_num, 0);
|
||||
folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
|
||||
nr_bytes = min_t(unsigned, num, folio_size(folio) - folio_offset);
|
||||
nr_pages = (offset + nr_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT;
|
||||
|
||||
err = fuse_copy_folio(cs, &folio, folio_offset, nr_bytes, 0);
|
||||
if (!folio_test_uptodate(folio) && !err && offset == 0 &&
|
||||
(this_num == folio_size(folio) || file_size == end)) {
|
||||
folio_zero_segment(folio, this_num, folio_size(folio));
|
||||
(nr_bytes == folio_size(folio) || file_size == end)) {
|
||||
folio_zero_segment(folio, nr_bytes, folio_size(folio));
|
||||
folio_mark_uptodate(folio);
|
||||
}
|
||||
folio_unlock(folio);
|
||||
|
|
@ -1808,9 +1813,9 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
|
|||
if (err)
|
||||
goto out_iput;
|
||||
|
||||
num -= this_num;
|
||||
num -= nr_bytes;
|
||||
offset = 0;
|
||||
index++;
|
||||
index += nr_pages;
|
||||
}
|
||||
|
||||
err = 0;
|
||||
|
|
@ -1849,7 +1854,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
|
|||
unsigned int num;
|
||||
unsigned int offset;
|
||||
size_t total_len = 0;
|
||||
unsigned int num_pages, cur_pages = 0;
|
||||
unsigned int num_pages;
|
||||
struct fuse_conn *fc = fm->fc;
|
||||
struct fuse_retrieve_args *ra;
|
||||
size_t args_size = sizeof(*ra);
|
||||
|
|
@ -1867,6 +1872,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
|
|||
|
||||
num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
|
||||
num_pages = min(num_pages, fc->max_pages);
|
||||
num = min(num, num_pages << PAGE_SHIFT);
|
||||
|
||||
args_size += num_pages * (sizeof(ap->folios[0]) + sizeof(ap->descs[0]));
|
||||
|
||||
|
|
@ -1887,25 +1893,29 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
|
|||
|
||||
index = outarg->offset >> PAGE_SHIFT;
|
||||
|
||||
while (num && cur_pages < num_pages) {
|
||||
while (num) {
|
||||
struct folio *folio;
|
||||
unsigned int this_num;
|
||||
unsigned int folio_offset;
|
||||
unsigned int nr_bytes;
|
||||
unsigned int nr_pages;
|
||||
|
||||
folio = filemap_get_folio(mapping, index);
|
||||
if (IS_ERR(folio))
|
||||
break;
|
||||
|
||||
this_num = min_t(unsigned, num, PAGE_SIZE - offset);
|
||||
folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
|
||||
nr_bytes = min(folio_size(folio) - folio_offset, num);
|
||||
nr_pages = (offset + nr_bytes + PAGE_SIZE - 1) >> PAGE_SHIFT;
|
||||
|
||||
ap->folios[ap->num_folios] = folio;
|
||||
ap->descs[ap->num_folios].offset = offset;
|
||||
ap->descs[ap->num_folios].length = this_num;
|
||||
ap->descs[ap->num_folios].offset = folio_offset;
|
||||
ap->descs[ap->num_folios].length = nr_bytes;
|
||||
ap->num_folios++;
|
||||
cur_pages++;
|
||||
|
||||
offset = 0;
|
||||
num -= this_num;
|
||||
total_len += this_num;
|
||||
index++;
|
||||
num -= nr_bytes;
|
||||
total_len += nr_bytes;
|
||||
index += nr_pages;
|
||||
}
|
||||
ra->inarg.offset = outarg->offset;
|
||||
ra->inarg.size = total_len;
|
||||
|
|
@ -2021,11 +2031,24 @@ static int fuse_notify_resend(struct fuse_conn *fc)
|
|||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Increments the fuse connection epoch. This will result of dentries from
|
||||
* previous epochs to be invalidated.
|
||||
*
|
||||
* XXX optimization: add call to shrink_dcache_sb()?
|
||||
*/
|
||||
static int fuse_notify_inc_epoch(struct fuse_conn *fc)
|
||||
{
|
||||
atomic_inc(&fc->epoch);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
|
||||
unsigned int size, struct fuse_copy_state *cs)
|
||||
{
|
||||
/* Don't try to move pages (yet) */
|
||||
cs->move_pages = 0;
|
||||
/* Don't try to move folios (yet) */
|
||||
cs->move_folios = false;
|
||||
|
||||
switch (code) {
|
||||
case FUSE_NOTIFY_POLL:
|
||||
|
|
@ -2049,6 +2072,9 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
|
|||
case FUSE_NOTIFY_RESEND:
|
||||
return fuse_notify_resend(fc);
|
||||
|
||||
case FUSE_NOTIFY_INC_EPOCH:
|
||||
return fuse_notify_inc_epoch(fc);
|
||||
|
||||
default:
|
||||
fuse_copy_finish(cs);
|
||||
return -EINVAL;
|
||||
|
|
@ -2173,7 +2199,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
|
|||
spin_unlock(&fpq->lock);
|
||||
cs->req = req;
|
||||
if (!req->args->page_replace)
|
||||
cs->move_pages = 0;
|
||||
cs->move_folios = false;
|
||||
|
||||
if (oh.error)
|
||||
err = nbytes != sizeof(oh) ? -EINVAL : 0;
|
||||
|
|
@ -2211,7 +2237,7 @@ static ssize_t fuse_dev_write(struct kiocb *iocb, struct iov_iter *from)
|
|||
if (!user_backed_iter(from))
|
||||
return -EINVAL;
|
||||
|
||||
fuse_copy_init(&cs, 0, from);
|
||||
fuse_copy_init(&cs, false, from);
|
||||
|
||||
return fuse_dev_do_write(fud, &cs, iov_iter_count(from));
|
||||
}
|
||||
|
|
@ -2285,13 +2311,13 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
|
|||
}
|
||||
pipe_unlock(pipe);
|
||||
|
||||
fuse_copy_init(&cs, 0, NULL);
|
||||
fuse_copy_init(&cs, false, NULL);
|
||||
cs.pipebufs = bufs;
|
||||
cs.nr_segs = nbuf;
|
||||
cs.pipe = pipe;
|
||||
|
||||
if (flags & SPLICE_F_MOVE)
|
||||
cs.move_pages = 1;
|
||||
cs.move_folios = true;
|
||||
|
||||
ret = fuse_dev_do_write(fud, &cs, len);
|
||||
|
||||
|
|
@ -2602,6 +2628,17 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd,
|
|||
}
|
||||
}
|
||||
|
||||
#ifdef CONFIG_PROC_FS
|
||||
static void fuse_dev_show_fdinfo(struct seq_file *seq, struct file *file)
|
||||
{
|
||||
struct fuse_dev *fud = fuse_get_dev(file);
|
||||
if (!fud)
|
||||
return;
|
||||
|
||||
seq_printf(seq, "fuse_connection:\t%u\n", fud->fc->dev);
|
||||
}
|
||||
#endif
|
||||
|
||||
const struct file_operations fuse_dev_operations = {
|
||||
.owner = THIS_MODULE,
|
||||
.open = fuse_dev_open,
|
||||
|
|
@ -2617,6 +2654,9 @@ const struct file_operations fuse_dev_operations = {
|
|||
#ifdef CONFIG_FUSE_IO_URING
|
||||
.uring_cmd = fuse_uring_cmd,
|
||||
#endif
|
||||
#ifdef CONFIG_PROC_FS
|
||||
.show_fdinfo = fuse_dev_show_fdinfo,
|
||||
#endif
|
||||
};
|
||||
EXPORT_SYMBOL_GPL(fuse_dev_operations);
|
||||
|
||||
|
|
|
|||
|
|
@ -140,6 +140,21 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring)
|
|||
}
|
||||
}
|
||||
|
||||
static bool ent_list_request_expired(struct fuse_conn *fc, struct list_head *list)
|
||||
{
|
||||
struct fuse_ring_ent *ent;
|
||||
struct fuse_req *req;
|
||||
|
||||
ent = list_first_entry_or_null(list, struct fuse_ring_ent, list);
|
||||
if (!ent)
|
||||
return false;
|
||||
|
||||
req = ent->fuse_req;
|
||||
|
||||
return time_is_before_jiffies(req->create_time +
|
||||
fc->timeout.req_timeout);
|
||||
}
|
||||
|
||||
bool fuse_uring_request_expired(struct fuse_conn *fc)
|
||||
{
|
||||
struct fuse_ring *ring = fc->ring;
|
||||
|
|
@ -157,7 +172,8 @@ bool fuse_uring_request_expired(struct fuse_conn *fc)
|
|||
spin_lock(&queue->lock);
|
||||
if (fuse_request_expired(fc, &queue->fuse_req_queue) ||
|
||||
fuse_request_expired(fc, &queue->fuse_req_bg_queue) ||
|
||||
fuse_fpq_processing_expired(fc, queue->fpq.processing)) {
|
||||
ent_list_request_expired(fc, &queue->ent_w_req_queue) ||
|
||||
ent_list_request_expired(fc, &queue->ent_in_userspace)) {
|
||||
spin_unlock(&queue->lock);
|
||||
return true;
|
||||
}
|
||||
|
|
@ -494,7 +510,7 @@ static void fuse_uring_cancel(struct io_uring_cmd *cmd,
|
|||
spin_lock(&queue->lock);
|
||||
if (ent->state == FRRS_AVAILABLE) {
|
||||
ent->state = FRRS_USERSPACE;
|
||||
list_move(&ent->list, &queue->ent_in_userspace);
|
||||
list_move_tail(&ent->list, &queue->ent_in_userspace);
|
||||
need_cmd_done = true;
|
||||
ent->cmd = NULL;
|
||||
}
|
||||
|
|
@ -577,8 +593,8 @@ static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
|
|||
if (err)
|
||||
return err;
|
||||
|
||||
fuse_copy_init(&cs, 0, &iter);
|
||||
cs.is_uring = 1;
|
||||
fuse_copy_init(&cs, false, &iter);
|
||||
cs.is_uring = true;
|
||||
cs.req = req;
|
||||
|
||||
return fuse_copy_out_args(&cs, args, ring_in_out.payload_sz);
|
||||
|
|
@ -607,8 +623,8 @@ static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req,
|
|||
return err;
|
||||
}
|
||||
|
||||
fuse_copy_init(&cs, 1, &iter);
|
||||
cs.is_uring = 1;
|
||||
fuse_copy_init(&cs, true, &iter);
|
||||
cs.is_uring = true;
|
||||
cs.req = req;
|
||||
|
||||
if (num_args > 0) {
|
||||
|
|
@ -714,7 +730,7 @@ static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ent,
|
|||
cmd = ent->cmd;
|
||||
ent->cmd = NULL;
|
||||
ent->state = FRRS_USERSPACE;
|
||||
list_move(&ent->list, &queue->ent_in_userspace);
|
||||
list_move_tail(&ent->list, &queue->ent_in_userspace);
|
||||
spin_unlock(&queue->lock);
|
||||
|
||||
io_uring_cmd_done(cmd, 0, 0, issue_flags);
|
||||
|
|
@ -764,7 +780,7 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
|
|||
clear_bit(FR_PENDING, &req->flags);
|
||||
ent->fuse_req = req;
|
||||
ent->state = FRRS_FUSE_REQ;
|
||||
list_move(&ent->list, &queue->ent_w_req_queue);
|
||||
list_move_tail(&ent->list, &queue->ent_w_req_queue);
|
||||
fuse_uring_add_to_pq(ent, req);
|
||||
}
|
||||
|
||||
|
|
@ -1180,7 +1196,7 @@ static void fuse_uring_send(struct fuse_ring_ent *ent, struct io_uring_cmd *cmd,
|
|||
|
||||
spin_lock(&queue->lock);
|
||||
ent->state = FRRS_USERSPACE;
|
||||
list_move(&ent->list, &queue->ent_in_userspace);
|
||||
list_move_tail(&ent->list, &queue->ent_in_userspace);
|
||||
ent->cmd = NULL;
|
||||
spin_unlock(&queue->lock);
|
||||
|
||||
|
|
|
|||
|
|
@ -200,9 +200,14 @@ static int fuse_dentry_revalidate(struct inode *dir, const struct qstr *name,
|
|||
{
|
||||
struct inode *inode;
|
||||
struct fuse_mount *fm;
|
||||
struct fuse_conn *fc;
|
||||
struct fuse_inode *fi;
|
||||
int ret;
|
||||
|
||||
fc = get_fuse_conn_super(dir->i_sb);
|
||||
if (entry->d_time < atomic_read(&fc->epoch))
|
||||
goto invalid;
|
||||
|
||||
inode = d_inode_rcu(entry);
|
||||
if (inode && fuse_is_bad(inode))
|
||||
goto invalid;
|
||||
|
|
@ -412,16 +417,20 @@ int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name
|
|||
static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry,
|
||||
unsigned int flags)
|
||||
{
|
||||
int err;
|
||||
struct fuse_entry_out outarg;
|
||||
struct fuse_conn *fc;
|
||||
struct inode *inode;
|
||||
struct dentry *newent;
|
||||
int err, epoch;
|
||||
bool outarg_valid = true;
|
||||
bool locked;
|
||||
|
||||
if (fuse_is_bad(dir))
|
||||
return ERR_PTR(-EIO);
|
||||
|
||||
fc = get_fuse_conn_super(dir->i_sb);
|
||||
epoch = atomic_read(&fc->epoch);
|
||||
|
||||
locked = fuse_lock_inode(dir);
|
||||
err = fuse_lookup_name(dir->i_sb, get_node_id(dir), &entry->d_name,
|
||||
&outarg, &inode);
|
||||
|
|
@ -443,6 +452,7 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry,
|
|||
goto out_err;
|
||||
|
||||
entry = newent ? newent : entry;
|
||||
entry->d_time = epoch;
|
||||
if (outarg_valid)
|
||||
fuse_change_entry_timeout(entry, &outarg);
|
||||
else
|
||||
|
|
@ -616,7 +626,6 @@ static int fuse_create_open(struct mnt_idmap *idmap, struct inode *dir,
|
|||
struct dentry *entry, struct file *file,
|
||||
unsigned int flags, umode_t mode, u32 opcode)
|
||||
{
|
||||
int err;
|
||||
struct inode *inode;
|
||||
struct fuse_mount *fm = get_fuse_mount(dir);
|
||||
FUSE_ARGS(args);
|
||||
|
|
@ -626,11 +635,13 @@ static int fuse_create_open(struct mnt_idmap *idmap, struct inode *dir,
|
|||
struct fuse_entry_out outentry;
|
||||
struct fuse_inode *fi;
|
||||
struct fuse_file *ff;
|
||||
int epoch, err;
|
||||
bool trunc = flags & O_TRUNC;
|
||||
|
||||
/* Userspace expects S_IFREG in create mode */
|
||||
BUG_ON((mode & S_IFMT) != S_IFREG);
|
||||
|
||||
epoch = atomic_read(&fm->fc->epoch);
|
||||
forget = fuse_alloc_forget();
|
||||
err = -ENOMEM;
|
||||
if (!forget)
|
||||
|
|
@ -699,6 +710,7 @@ static int fuse_create_open(struct mnt_idmap *idmap, struct inode *dir,
|
|||
}
|
||||
kfree(forget);
|
||||
d_instantiate(entry, inode);
|
||||
entry->d_time = epoch;
|
||||
fuse_change_entry_timeout(entry, &outentry);
|
||||
fuse_dir_changed(dir);
|
||||
err = generic_file_open(inode, file);
|
||||
|
|
@ -785,12 +797,14 @@ static struct dentry *create_new_entry(struct mnt_idmap *idmap, struct fuse_moun
|
|||
struct fuse_entry_out outarg;
|
||||
struct inode *inode;
|
||||
struct dentry *d;
|
||||
int err;
|
||||
struct fuse_forget_link *forget;
|
||||
int epoch, err;
|
||||
|
||||
if (fuse_is_bad(dir))
|
||||
return ERR_PTR(-EIO);
|
||||
|
||||
epoch = atomic_read(&fm->fc->epoch);
|
||||
|
||||
forget = fuse_alloc_forget();
|
||||
if (!forget)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
|
@ -832,10 +846,13 @@ static struct dentry *create_new_entry(struct mnt_idmap *idmap, struct fuse_moun
|
|||
if (IS_ERR(d))
|
||||
return d;
|
||||
|
||||
if (d)
|
||||
if (d) {
|
||||
d->d_time = epoch;
|
||||
fuse_change_entry_timeout(d, &outarg);
|
||||
else
|
||||
} else {
|
||||
entry->d_time = epoch;
|
||||
fuse_change_entry_timeout(entry, &outarg);
|
||||
}
|
||||
fuse_dir_changed(dir);
|
||||
return d;
|
||||
|
||||
|
|
@ -1609,10 +1626,10 @@ static int fuse_permission(struct mnt_idmap *idmap,
|
|||
return err;
|
||||
}
|
||||
|
||||
static int fuse_readlink_page(struct inode *inode, struct folio *folio)
|
||||
static int fuse_readlink_folio(struct inode *inode, struct folio *folio)
|
||||
{
|
||||
struct fuse_mount *fm = get_fuse_mount(inode);
|
||||
struct fuse_folio_desc desc = { .length = PAGE_SIZE - 1 };
|
||||
struct fuse_folio_desc desc = { .length = folio_size(folio) - 1 };
|
||||
struct fuse_args_pages ap = {
|
||||
.num_folios = 1,
|
||||
.folios = &folio,
|
||||
|
|
@ -1667,7 +1684,7 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode,
|
|||
if (!folio)
|
||||
goto out_err;
|
||||
|
||||
err = fuse_readlink_page(inode, folio);
|
||||
err = fuse_readlink_folio(inode, folio);
|
||||
if (err) {
|
||||
folio_put(folio);
|
||||
goto out_err;
|
||||
|
|
@ -1943,6 +1960,7 @@ int fuse_do_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
|
|||
int err;
|
||||
bool trust_local_cmtime = is_wb;
|
||||
bool fault_blocked = false;
|
||||
u64 attr_version;
|
||||
|
||||
if (!fc->default_permissions)
|
||||
attr->ia_valid |= ATTR_FORCE;
|
||||
|
|
@ -2027,6 +2045,8 @@ int fuse_do_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
|
|||
if (fc->handle_killpriv_v2 && !capable(CAP_FSETID))
|
||||
inarg.valid |= FATTR_KILL_SUIDGID;
|
||||
}
|
||||
|
||||
attr_version = fuse_get_attr_version(fm->fc);
|
||||
fuse_setattr_fill(fc, &args, inode, &inarg, &outarg);
|
||||
err = fuse_simple_request(fm, &args);
|
||||
if (err) {
|
||||
|
|
@ -2052,6 +2072,14 @@ int fuse_do_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
|
|||
/* FIXME: clear I_DIRTY_SYNC? */
|
||||
}
|
||||
|
||||
if (fi->attr_version > attr_version) {
|
||||
/*
|
||||
* Apply attributes, for example for fsnotify_change(), but set
|
||||
* attribute timeout to zero.
|
||||
*/
|
||||
outarg.attr_valid = outarg.attr_valid_nsec = 0;
|
||||
}
|
||||
|
||||
fuse_change_attributes_common(inode, &outarg.attr, NULL,
|
||||
ATTR_TIMEOUT(&outarg),
|
||||
fuse_get_cache_mask(inode), 0);
|
||||
|
|
@ -2257,7 +2285,7 @@ void fuse_init_dir(struct inode *inode)
|
|||
|
||||
static int fuse_symlink_read_folio(struct file *null, struct folio *folio)
|
||||
{
|
||||
int err = fuse_readlink_page(folio->mapping->host, folio);
|
||||
int err = fuse_readlink_folio(folio->mapping->host, folio);
|
||||
|
||||
if (!err)
|
||||
folio_mark_uptodate(folio);
|
||||
|
|
|
|||
474
fs/fuse/file.c
474
fs/fuse/file.c
|
|
@ -415,89 +415,11 @@ u64 fuse_lock_owner_id(struct fuse_conn *fc, fl_owner_t id)
|
|||
|
||||
struct fuse_writepage_args {
|
||||
struct fuse_io_args ia;
|
||||
struct rb_node writepages_entry;
|
||||
struct list_head queue_entry;
|
||||
struct fuse_writepage_args *next;
|
||||
struct inode *inode;
|
||||
struct fuse_sync_bucket *bucket;
|
||||
};
|
||||
|
||||
static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi,
|
||||
pgoff_t idx_from, pgoff_t idx_to)
|
||||
{
|
||||
struct rb_node *n;
|
||||
|
||||
n = fi->writepages.rb_node;
|
||||
|
||||
while (n) {
|
||||
struct fuse_writepage_args *wpa;
|
||||
pgoff_t curr_index;
|
||||
|
||||
wpa = rb_entry(n, struct fuse_writepage_args, writepages_entry);
|
||||
WARN_ON(get_fuse_inode(wpa->inode) != fi);
|
||||
curr_index = wpa->ia.write.in.offset >> PAGE_SHIFT;
|
||||
if (idx_from >= curr_index + wpa->ia.ap.num_folios)
|
||||
n = n->rb_right;
|
||||
else if (idx_to < curr_index)
|
||||
n = n->rb_left;
|
||||
else
|
||||
return wpa;
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* Check if any page in a range is under writeback
|
||||
*/
|
||||
static bool fuse_range_is_writeback(struct inode *inode, pgoff_t idx_from,
|
||||
pgoff_t idx_to)
|
||||
{
|
||||
struct fuse_inode *fi = get_fuse_inode(inode);
|
||||
bool found;
|
||||
|
||||
if (RB_EMPTY_ROOT(&fi->writepages))
|
||||
return false;
|
||||
|
||||
spin_lock(&fi->lock);
|
||||
found = fuse_find_writeback(fi, idx_from, idx_to);
|
||||
spin_unlock(&fi->lock);
|
||||
|
||||
return found;
|
||||
}
|
||||
|
||||
static inline bool fuse_page_is_writeback(struct inode *inode, pgoff_t index)
|
||||
{
|
||||
return fuse_range_is_writeback(inode, index, index);
|
||||
}
|
||||
|
||||
/*
|
||||
* Wait for page writeback to be completed.
|
||||
*
|
||||
* Since fuse doesn't rely on the VM writeback tracking, this has to
|
||||
* use some other means.
|
||||
*/
|
||||
static void fuse_wait_on_page_writeback(struct inode *inode, pgoff_t index)
|
||||
{
|
||||
struct fuse_inode *fi = get_fuse_inode(inode);
|
||||
|
||||
wait_event(fi->page_waitq, !fuse_page_is_writeback(inode, index));
|
||||
}
|
||||
|
||||
static inline bool fuse_folio_is_writeback(struct inode *inode,
|
||||
struct folio *folio)
|
||||
{
|
||||
pgoff_t last = folio_next_index(folio) - 1;
|
||||
return fuse_range_is_writeback(inode, folio->index, last);
|
||||
}
|
||||
|
||||
static void fuse_wait_on_folio_writeback(struct inode *inode,
|
||||
struct folio *folio)
|
||||
{
|
||||
struct fuse_inode *fi = get_fuse_inode(inode);
|
||||
|
||||
wait_event(fi->page_waitq, !fuse_folio_is_writeback(inode, folio));
|
||||
}
|
||||
|
||||
/*
|
||||
* Wait for all pending writepages on the inode to finish.
|
||||
*
|
||||
|
|
@ -532,10 +454,6 @@ static int fuse_flush(struct file *file, fl_owner_t id)
|
|||
if (err)
|
||||
return err;
|
||||
|
||||
inode_lock(inode);
|
||||
fuse_sync_writes(inode);
|
||||
inode_unlock(inode);
|
||||
|
||||
err = filemap_check_errors(file->f_mapping);
|
||||
if (err)
|
||||
return err;
|
||||
|
|
@ -875,7 +793,7 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio)
|
|||
struct inode *inode = folio->mapping->host;
|
||||
struct fuse_mount *fm = get_fuse_mount(inode);
|
||||
loff_t pos = folio_pos(folio);
|
||||
struct fuse_folio_desc desc = { .length = PAGE_SIZE };
|
||||
struct fuse_folio_desc desc = { .length = folio_size(folio) };
|
||||
struct fuse_io_args ia = {
|
||||
.ap.args.page_zeroing = true,
|
||||
.ap.args.out_pages = true,
|
||||
|
|
@ -886,13 +804,6 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio)
|
|||
ssize_t res;
|
||||
u64 attr_ver;
|
||||
|
||||
/*
|
||||
* With the temporary pages that are used to complete writeback, we can
|
||||
* have writeback that extends beyond the lifetime of the folio. So
|
||||
* make sure we read a properly synced folio.
|
||||
*/
|
||||
fuse_wait_on_folio_writeback(inode, folio);
|
||||
|
||||
attr_ver = fuse_get_attr_version(fm->fc);
|
||||
|
||||
/* Don't overflow end offset */
|
||||
|
|
@ -965,14 +876,13 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
|
|||
fuse_io_free(ia);
|
||||
}
|
||||
|
||||
static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file)
|
||||
static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
|
||||
unsigned int count)
|
||||
{
|
||||
struct fuse_file *ff = file->private_data;
|
||||
struct fuse_mount *fm = ff->fm;
|
||||
struct fuse_args_pages *ap = &ia->ap;
|
||||
loff_t pos = folio_pos(ap->folios[0]);
|
||||
/* Currently, all folios in FUSE are one page */
|
||||
size_t count = ap->num_folios << PAGE_SHIFT;
|
||||
ssize_t res;
|
||||
int err;
|
||||
|
||||
|
|
@ -1005,17 +915,13 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file)
|
|||
static void fuse_readahead(struct readahead_control *rac)
|
||||
{
|
||||
struct inode *inode = rac->mapping->host;
|
||||
struct fuse_inode *fi = get_fuse_inode(inode);
|
||||
struct fuse_conn *fc = get_fuse_conn(inode);
|
||||
unsigned int max_pages, nr_pages;
|
||||
pgoff_t first = readahead_index(rac);
|
||||
pgoff_t last = first + readahead_count(rac) - 1;
|
||||
struct folio *folio = NULL;
|
||||
|
||||
if (fuse_is_bad(inode))
|
||||
return;
|
||||
|
||||
wait_event(fi->page_waitq, !fuse_range_is_writeback(inode, first, last));
|
||||
|
||||
max_pages = min_t(unsigned int, fc->max_pages,
|
||||
fc->max_read / PAGE_SIZE);
|
||||
|
||||
|
|
@ -1033,8 +939,8 @@ static void fuse_readahead(struct readahead_control *rac)
|
|||
while (nr_pages) {
|
||||
struct fuse_io_args *ia;
|
||||
struct fuse_args_pages *ap;
|
||||
struct folio *folio;
|
||||
unsigned cur_pages = min(max_pages, nr_pages);
|
||||
unsigned int pages = 0;
|
||||
|
||||
if (fc->num_background >= fc->congestion_threshold &&
|
||||
rac->ra->async_size >= readahead_count(rac))
|
||||
|
|
@ -1046,10 +952,12 @@ static void fuse_readahead(struct readahead_control *rac)
|
|||
|
||||
ia = fuse_io_alloc(NULL, cur_pages);
|
||||
if (!ia)
|
||||
return;
|
||||
break;
|
||||
ap = &ia->ap;
|
||||
|
||||
while (ap->num_folios < cur_pages) {
|
||||
while (pages < cur_pages) {
|
||||
unsigned int folio_pages;
|
||||
|
||||
/*
|
||||
* This returns a folio with a ref held on it.
|
||||
* The ref needs to be held until the request is
|
||||
|
|
@ -1057,13 +965,31 @@ static void fuse_readahead(struct readahead_control *rac)
|
|||
* fuse_try_move_page()) drops the ref after it's
|
||||
* replaced in the page cache.
|
||||
*/
|
||||
folio = __readahead_folio(rac);
|
||||
if (!folio)
|
||||
folio = __readahead_folio(rac);
|
||||
|
||||
folio_pages = folio_nr_pages(folio);
|
||||
if (folio_pages > cur_pages - pages) {
|
||||
/*
|
||||
* Large folios belonging to fuse will never
|
||||
* have more pages than max_pages.
|
||||
*/
|
||||
WARN_ON(!pages);
|
||||
break;
|
||||
}
|
||||
|
||||
ap->folios[ap->num_folios] = folio;
|
||||
ap->descs[ap->num_folios].length = folio_size(folio);
|
||||
ap->num_folios++;
|
||||
pages += folio_pages;
|
||||
folio = NULL;
|
||||
}
|
||||
fuse_send_readpages(ia, rac->file);
|
||||
nr_pages -= cur_pages;
|
||||
fuse_send_readpages(ia, rac->file, pages << PAGE_SHIFT);
|
||||
nr_pages -= pages;
|
||||
}
|
||||
if (folio) {
|
||||
folio_end_read(folio, false);
|
||||
folio_put(folio);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -1181,7 +1107,7 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
|
|||
int err;
|
||||
|
||||
for (i = 0; i < ap->num_folios; i++)
|
||||
fuse_wait_on_folio_writeback(inode, ap->folios[i]);
|
||||
folio_wait_writeback(ap->folios[i]);
|
||||
|
||||
fuse_write_args_fill(ia, ff, pos, count);
|
||||
ia->write.in.flags = fuse_write_flags(iocb);
|
||||
|
|
@ -1226,27 +1152,24 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
|
|||
struct fuse_args_pages *ap = &ia->ap;
|
||||
struct fuse_conn *fc = get_fuse_conn(mapping->host);
|
||||
unsigned offset = pos & (PAGE_SIZE - 1);
|
||||
unsigned int nr_pages = 0;
|
||||
size_t count = 0;
|
||||
int err;
|
||||
unsigned int num;
|
||||
int err = 0;
|
||||
|
||||
num = min(iov_iter_count(ii), fc->max_write);
|
||||
num = min(num, max_pages << PAGE_SHIFT);
|
||||
|
||||
ap->args.in_pages = true;
|
||||
ap->descs[0].offset = offset;
|
||||
|
||||
do {
|
||||
while (num) {
|
||||
size_t tmp;
|
||||
struct folio *folio;
|
||||
pgoff_t index = pos >> PAGE_SHIFT;
|
||||
size_t bytes = min_t(size_t, PAGE_SIZE - offset,
|
||||
iov_iter_count(ii));
|
||||
|
||||
bytes = min_t(size_t, bytes, fc->max_write - count);
|
||||
unsigned int bytes;
|
||||
unsigned int folio_offset;
|
||||
|
||||
again:
|
||||
err = -EFAULT;
|
||||
if (fault_in_iov_iter_readable(ii, bytes))
|
||||
break;
|
||||
|
||||
folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
|
||||
mapping_gfp_mask(mapping));
|
||||
if (IS_ERR(folio)) {
|
||||
|
|
@ -1257,29 +1180,42 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
|
|||
if (mapping_writably_mapped(mapping))
|
||||
flush_dcache_folio(folio);
|
||||
|
||||
tmp = copy_folio_from_iter_atomic(folio, offset, bytes, ii);
|
||||
folio_offset = ((index - folio->index) << PAGE_SHIFT) + offset;
|
||||
bytes = min(folio_size(folio) - folio_offset, num);
|
||||
|
||||
tmp = copy_folio_from_iter_atomic(folio, folio_offset, bytes, ii);
|
||||
flush_dcache_folio(folio);
|
||||
|
||||
if (!tmp) {
|
||||
folio_unlock(folio);
|
||||
folio_put(folio);
|
||||
|
||||
/*
|
||||
* Ensure forward progress by faulting in
|
||||
* while not holding the folio lock:
|
||||
*/
|
||||
if (fault_in_iov_iter_readable(ii, bytes)) {
|
||||
err = -EFAULT;
|
||||
break;
|
||||
}
|
||||
|
||||
goto again;
|
||||
}
|
||||
|
||||
err = 0;
|
||||
ap->folios[ap->num_folios] = folio;
|
||||
ap->descs[ap->num_folios].offset = folio_offset;
|
||||
ap->descs[ap->num_folios].length = tmp;
|
||||
ap->num_folios++;
|
||||
nr_pages++;
|
||||
|
||||
count += tmp;
|
||||
pos += tmp;
|
||||
num -= tmp;
|
||||
offset += tmp;
|
||||
if (offset == PAGE_SIZE)
|
||||
if (offset == folio_size(folio))
|
||||
offset = 0;
|
||||
|
||||
/* If we copied full page, mark it uptodate */
|
||||
if (tmp == PAGE_SIZE)
|
||||
/* If we copied full folio, mark it uptodate */
|
||||
if (tmp == folio_size(folio))
|
||||
folio_mark_uptodate(folio);
|
||||
|
||||
if (folio_test_uptodate(folio)) {
|
||||
|
|
@ -1288,10 +1224,9 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
|
|||
ia->write.folio_locked = true;
|
||||
break;
|
||||
}
|
||||
if (!fc->big_writes)
|
||||
if (!fc->big_writes || offset != 0)
|
||||
break;
|
||||
} while (iov_iter_count(ii) && count < fc->max_write &&
|
||||
nr_pages < max_pages && offset == 0);
|
||||
}
|
||||
|
||||
return count > 0 ? count : err;
|
||||
}
|
||||
|
|
@ -1638,7 +1573,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
|
|||
return res;
|
||||
}
|
||||
}
|
||||
if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) {
|
||||
if (!cuse && filemap_range_has_writeback(mapping, pos, (pos + count - 1))) {
|
||||
if (!write)
|
||||
inode_lock(inode);
|
||||
fuse_sync_writes(inode);
|
||||
|
|
@ -1835,38 +1770,34 @@ static ssize_t fuse_splice_write(struct pipe_inode_info *pipe, struct file *out,
|
|||
static void fuse_writepage_free(struct fuse_writepage_args *wpa)
|
||||
{
|
||||
struct fuse_args_pages *ap = &wpa->ia.ap;
|
||||
int i;
|
||||
|
||||
if (wpa->bucket)
|
||||
fuse_sync_bucket_dec(wpa->bucket);
|
||||
|
||||
for (i = 0; i < ap->num_folios; i++)
|
||||
folio_put(ap->folios[i]);
|
||||
|
||||
fuse_file_put(wpa->ia.ff, false);
|
||||
|
||||
kfree(ap->folios);
|
||||
kfree(wpa);
|
||||
}
|
||||
|
||||
static void fuse_writepage_finish_stat(struct inode *inode, struct folio *folio)
|
||||
{
|
||||
struct backing_dev_info *bdi = inode_to_bdi(inode);
|
||||
|
||||
dec_wb_stat(&bdi->wb, WB_WRITEBACK);
|
||||
node_stat_sub_folio(folio, NR_WRITEBACK_TEMP);
|
||||
wb_writeout_inc(&bdi->wb);
|
||||
}
|
||||
|
||||
static void fuse_writepage_finish(struct fuse_writepage_args *wpa)
|
||||
{
|
||||
struct fuse_args_pages *ap = &wpa->ia.ap;
|
||||
struct inode *inode = wpa->inode;
|
||||
struct fuse_inode *fi = get_fuse_inode(inode);
|
||||
struct backing_dev_info *bdi = inode_to_bdi(inode);
|
||||
int i;
|
||||
|
||||
for (i = 0; i < ap->num_folios; i++)
|
||||
fuse_writepage_finish_stat(inode, ap->folios[i]);
|
||||
for (i = 0; i < ap->num_folios; i++) {
|
||||
/*
|
||||
* Benchmarks showed that ending writeback within the
|
||||
* scope of the fi->lock alleviates xarray lock
|
||||
* contention and noticeably improves performance.
|
||||
*/
|
||||
folio_end_writeback(ap->folios[i]);
|
||||
dec_wb_stat(&bdi->wb, WB_WRITEBACK);
|
||||
wb_writeout_inc(&bdi->wb);
|
||||
}
|
||||
|
||||
wake_up(&fi->page_waitq);
|
||||
}
|
||||
|
|
@ -1877,13 +1808,15 @@ static void fuse_send_writepage(struct fuse_mount *fm,
|
|||
__releases(fi->lock)
|
||||
__acquires(fi->lock)
|
||||
{
|
||||
struct fuse_writepage_args *aux, *next;
|
||||
struct fuse_inode *fi = get_fuse_inode(wpa->inode);
|
||||
struct fuse_args_pages *ap = &wpa->ia.ap;
|
||||
struct fuse_write_in *inarg = &wpa->ia.write.in;
|
||||
struct fuse_args *args = &wpa->ia.ap.args;
|
||||
/* Currently, all folios in FUSE are one page */
|
||||
__u64 data_size = wpa->ia.ap.num_folios * PAGE_SIZE;
|
||||
int err;
|
||||
struct fuse_args *args = &ap->args;
|
||||
__u64 data_size = 0;
|
||||
int err, i;
|
||||
|
||||
for (i = 0; i < ap->num_folios; i++)
|
||||
data_size += ap->descs[i].length;
|
||||
|
||||
fi->writectr++;
|
||||
if (inarg->offset + data_size <= size) {
|
||||
|
|
@ -1914,19 +1847,8 @@ __acquires(fi->lock)
|
|||
|
||||
out_free:
|
||||
fi->writectr--;
|
||||
rb_erase(&wpa->writepages_entry, &fi->writepages);
|
||||
fuse_writepage_finish(wpa);
|
||||
spin_unlock(&fi->lock);
|
||||
|
||||
/* After rb_erase() aux request list is private */
|
||||
for (aux = wpa->next; aux; aux = next) {
|
||||
next = aux->next;
|
||||
aux->next = NULL;
|
||||
fuse_writepage_finish_stat(aux->inode,
|
||||
aux->ia.ap.folios[0]);
|
||||
fuse_writepage_free(aux);
|
||||
}
|
||||
|
||||
fuse_writepage_free(wpa);
|
||||
spin_lock(&fi->lock);
|
||||
}
|
||||
|
|
@ -1954,43 +1876,6 @@ __acquires(fi->lock)
|
|||
}
|
||||
}
|
||||
|
||||
static struct fuse_writepage_args *fuse_insert_writeback(struct rb_root *root,
|
||||
struct fuse_writepage_args *wpa)
|
||||
{
|
||||
pgoff_t idx_from = wpa->ia.write.in.offset >> PAGE_SHIFT;
|
||||
pgoff_t idx_to = idx_from + wpa->ia.ap.num_folios - 1;
|
||||
struct rb_node **p = &root->rb_node;
|
||||
struct rb_node *parent = NULL;
|
||||
|
||||
WARN_ON(!wpa->ia.ap.num_folios);
|
||||
while (*p) {
|
||||
struct fuse_writepage_args *curr;
|
||||
pgoff_t curr_index;
|
||||
|
||||
parent = *p;
|
||||
curr = rb_entry(parent, struct fuse_writepage_args,
|
||||
writepages_entry);
|
||||
WARN_ON(curr->inode != wpa->inode);
|
||||
curr_index = curr->ia.write.in.offset >> PAGE_SHIFT;
|
||||
|
||||
if (idx_from >= curr_index + curr->ia.ap.num_folios)
|
||||
p = &(*p)->rb_right;
|
||||
else if (idx_to < curr_index)
|
||||
p = &(*p)->rb_left;
|
||||
else
|
||||
return curr;
|
||||
}
|
||||
|
||||
rb_link_node(&wpa->writepages_entry, parent, p);
|
||||
rb_insert_color(&wpa->writepages_entry, root);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static void tree_insert(struct rb_root *root, struct fuse_writepage_args *wpa)
|
||||
{
|
||||
WARN_ON(fuse_insert_writeback(root, wpa));
|
||||
}
|
||||
|
||||
static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args,
|
||||
int error)
|
||||
{
|
||||
|
|
@ -2010,41 +1895,6 @@ static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args,
|
|||
if (!fc->writeback_cache)
|
||||
fuse_invalidate_attr_mask(inode, FUSE_STATX_MODIFY);
|
||||
spin_lock(&fi->lock);
|
||||
rb_erase(&wpa->writepages_entry, &fi->writepages);
|
||||
while (wpa->next) {
|
||||
struct fuse_mount *fm = get_fuse_mount(inode);
|
||||
struct fuse_write_in *inarg = &wpa->ia.write.in;
|
||||
struct fuse_writepage_args *next = wpa->next;
|
||||
|
||||
wpa->next = next->next;
|
||||
next->next = NULL;
|
||||
tree_insert(&fi->writepages, next);
|
||||
|
||||
/*
|
||||
* Skip fuse_flush_writepages() to make it easy to crop requests
|
||||
* based on primary request size.
|
||||
*
|
||||
* 1st case (trivial): there are no concurrent activities using
|
||||
* fuse_set/release_nowrite. Then we're on safe side because
|
||||
* fuse_flush_writepages() would call fuse_send_writepage()
|
||||
* anyway.
|
||||
*
|
||||
* 2nd case: someone called fuse_set_nowrite and it is waiting
|
||||
* now for completion of all in-flight requests. This happens
|
||||
* rarely and no more than once per page, so this should be
|
||||
* okay.
|
||||
*
|
||||
* 3rd case: someone (e.g. fuse_do_setattr()) is in the middle
|
||||
* of fuse_set_nowrite..fuse_release_nowrite section. The fact
|
||||
* that fuse_set_nowrite returned implies that all in-flight
|
||||
* requests were completed along with all of their secondary
|
||||
* requests. Further primary requests are blocked by negative
|
||||
* writectr. Hence there cannot be any in-flight requests and
|
||||
* no invocations of fuse_writepage_end() while we're in
|
||||
* fuse_set_nowrite..fuse_release_nowrite section.
|
||||
*/
|
||||
fuse_send_writepage(fm, next, inarg->offset + inarg->size);
|
||||
}
|
||||
fi->writectr--;
|
||||
fuse_writepage_finish(wpa);
|
||||
spin_unlock(&fi->lock);
|
||||
|
|
@ -2131,19 +1981,16 @@ static void fuse_writepage_add_to_bucket(struct fuse_conn *fc,
|
|||
}
|
||||
|
||||
static void fuse_writepage_args_page_fill(struct fuse_writepage_args *wpa, struct folio *folio,
|
||||
struct folio *tmp_folio, uint32_t folio_index)
|
||||
uint32_t folio_index)
|
||||
{
|
||||
struct inode *inode = folio->mapping->host;
|
||||
struct fuse_args_pages *ap = &wpa->ia.ap;
|
||||
|
||||
folio_copy(tmp_folio, folio);
|
||||
|
||||
ap->folios[folio_index] = tmp_folio;
|
||||
ap->folios[folio_index] = folio;
|
||||
ap->descs[folio_index].offset = 0;
|
||||
ap->descs[folio_index].length = PAGE_SIZE;
|
||||
ap->descs[folio_index].length = folio_size(folio);
|
||||
|
||||
inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK);
|
||||
node_stat_add_folio(tmp_folio, NR_WRITEBACK_TEMP);
|
||||
}
|
||||
|
||||
static struct fuse_writepage_args *fuse_writepage_args_setup(struct folio *folio,
|
||||
|
|
@ -2178,18 +2025,12 @@ static int fuse_writepage_locked(struct folio *folio)
|
|||
struct fuse_inode *fi = get_fuse_inode(inode);
|
||||
struct fuse_writepage_args *wpa;
|
||||
struct fuse_args_pages *ap;
|
||||
struct folio *tmp_folio;
|
||||
struct fuse_file *ff;
|
||||
int error = -ENOMEM;
|
||||
int error = -EIO;
|
||||
|
||||
tmp_folio = folio_alloc(GFP_NOFS | __GFP_HIGHMEM, 0);
|
||||
if (!tmp_folio)
|
||||
goto err;
|
||||
|
||||
error = -EIO;
|
||||
ff = fuse_write_file_get(fi);
|
||||
if (!ff)
|
||||
goto err_nofile;
|
||||
goto err;
|
||||
|
||||
wpa = fuse_writepage_args_setup(folio, ff);
|
||||
error = -ENOMEM;
|
||||
|
|
@ -2200,22 +2041,17 @@ static int fuse_writepage_locked(struct folio *folio)
|
|||
ap->num_folios = 1;
|
||||
|
||||
folio_start_writeback(folio);
|
||||
fuse_writepage_args_page_fill(wpa, folio, tmp_folio, 0);
|
||||
fuse_writepage_args_page_fill(wpa, folio, 0);
|
||||
|
||||
spin_lock(&fi->lock);
|
||||
tree_insert(&fi->writepages, wpa);
|
||||
list_add_tail(&wpa->queue_entry, &fi->queued_writes);
|
||||
fuse_flush_writepages(inode);
|
||||
spin_unlock(&fi->lock);
|
||||
|
||||
folio_end_writeback(folio);
|
||||
|
||||
return 0;
|
||||
|
||||
err_writepage_args:
|
||||
fuse_file_put(ff, false);
|
||||
err_nofile:
|
||||
folio_put(tmp_folio);
|
||||
err:
|
||||
mapping_set_error(folio->mapping, error);
|
||||
return error;
|
||||
|
|
@ -2225,8 +2061,8 @@ struct fuse_fill_wb_data {
|
|||
struct fuse_writepage_args *wpa;
|
||||
struct fuse_file *ff;
|
||||
struct inode *inode;
|
||||
struct folio **orig_folios;
|
||||
unsigned int max_folios;
|
||||
unsigned int nr_pages;
|
||||
};
|
||||
|
||||
static bool fuse_pages_realloc(struct fuse_fill_wb_data *data)
|
||||
|
|
@ -2260,69 +2096,11 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data)
|
|||
struct fuse_writepage_args *wpa = data->wpa;
|
||||
struct inode *inode = data->inode;
|
||||
struct fuse_inode *fi = get_fuse_inode(inode);
|
||||
int num_folios = wpa->ia.ap.num_folios;
|
||||
int i;
|
||||
|
||||
spin_lock(&fi->lock);
|
||||
list_add_tail(&wpa->queue_entry, &fi->queued_writes);
|
||||
fuse_flush_writepages(inode);
|
||||
spin_unlock(&fi->lock);
|
||||
|
||||
for (i = 0; i < num_folios; i++)
|
||||
folio_end_writeback(data->orig_folios[i]);
|
||||
}
|
||||
|
||||
/*
|
||||
* Check under fi->lock if the page is under writeback, and insert it onto the
|
||||
* rb_tree if not. Otherwise iterate auxiliary write requests, to see if there's
|
||||
* one already added for a page at this offset. If there's none, then insert
|
||||
* this new request onto the auxiliary list, otherwise reuse the existing one by
|
||||
* swapping the new temp page with the old one.
|
||||
*/
|
||||
static bool fuse_writepage_add(struct fuse_writepage_args *new_wpa,
|
||||
struct folio *folio)
|
||||
{
|
||||
struct fuse_inode *fi = get_fuse_inode(new_wpa->inode);
|
||||
struct fuse_writepage_args *tmp;
|
||||
struct fuse_writepage_args *old_wpa;
|
||||
struct fuse_args_pages *new_ap = &new_wpa->ia.ap;
|
||||
|
||||
WARN_ON(new_ap->num_folios != 0);
|
||||
new_ap->num_folios = 1;
|
||||
|
||||
spin_lock(&fi->lock);
|
||||
old_wpa = fuse_insert_writeback(&fi->writepages, new_wpa);
|
||||
if (!old_wpa) {
|
||||
spin_unlock(&fi->lock);
|
||||
return true;
|
||||
}
|
||||
|
||||
for (tmp = old_wpa->next; tmp; tmp = tmp->next) {
|
||||
pgoff_t curr_index;
|
||||
|
||||
WARN_ON(tmp->inode != new_wpa->inode);
|
||||
curr_index = tmp->ia.write.in.offset >> PAGE_SHIFT;
|
||||
if (curr_index == folio->index) {
|
||||
WARN_ON(tmp->ia.ap.num_folios != 1);
|
||||
swap(tmp->ia.ap.folios[0], new_ap->folios[0]);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!tmp) {
|
||||
new_wpa->next = old_wpa->next;
|
||||
old_wpa->next = new_wpa;
|
||||
}
|
||||
|
||||
spin_unlock(&fi->lock);
|
||||
|
||||
if (tmp) {
|
||||
fuse_writepage_finish_stat(new_wpa->inode,
|
||||
folio);
|
||||
fuse_writepage_free(new_wpa);
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
|
||||
|
|
@ -2331,25 +2109,16 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct folio *folio,
|
|||
{
|
||||
WARN_ON(!ap->num_folios);
|
||||
|
||||
/*
|
||||
* Being under writeback is unlikely but possible. For example direct
|
||||
* read to an mmaped fuse file will set the page dirty twice; once when
|
||||
* the pages are faulted with get_user_pages(), and then after the read
|
||||
* completed.
|
||||
*/
|
||||
if (fuse_folio_is_writeback(data->inode, folio))
|
||||
return true;
|
||||
|
||||
/* Reached max pages */
|
||||
if (ap->num_folios == fc->max_pages)
|
||||
if (data->nr_pages + folio_nr_pages(folio) > fc->max_pages)
|
||||
return true;
|
||||
|
||||
/* Reached max write bytes */
|
||||
if ((ap->num_folios + 1) * PAGE_SIZE > fc->max_write)
|
||||
if ((data->nr_pages * PAGE_SIZE) + folio_size(folio) > fc->max_write)
|
||||
return true;
|
||||
|
||||
/* Discontinuity */
|
||||
if (data->orig_folios[ap->num_folios - 1]->index + 1 != folio->index)
|
||||
if (folio_next_index(ap->folios[ap->num_folios - 1]) != folio->index)
|
||||
return true;
|
||||
|
||||
/* Need to grow the pages array? If so, did the expansion fail? */
|
||||
|
|
@ -2368,7 +2137,6 @@ static int fuse_writepages_fill(struct folio *folio,
|
|||
struct inode *inode = data->inode;
|
||||
struct fuse_inode *fi = get_fuse_inode(inode);
|
||||
struct fuse_conn *fc = get_fuse_conn(inode);
|
||||
struct folio *tmp_folio;
|
||||
int err;
|
||||
|
||||
if (!data->ff) {
|
||||
|
|
@ -2381,56 +2149,27 @@ static int fuse_writepages_fill(struct folio *folio,
|
|||
if (wpa && fuse_writepage_need_send(fc, folio, ap, data)) {
|
||||
fuse_writepages_send(data);
|
||||
data->wpa = NULL;
|
||||
data->nr_pages = 0;
|
||||
}
|
||||
|
||||
err = -ENOMEM;
|
||||
tmp_folio = folio_alloc(GFP_NOFS | __GFP_HIGHMEM, 0);
|
||||
if (!tmp_folio)
|
||||
goto out_unlock;
|
||||
|
||||
/*
|
||||
* The page must not be redirtied until the writeout is completed
|
||||
* (i.e. userspace has sent a reply to the write request). Otherwise
|
||||
* there could be more than one temporary page instance for each real
|
||||
* page.
|
||||
*
|
||||
* This is ensured by holding the page lock in page_mkwrite() while
|
||||
* checking fuse_page_is_writeback(). We already hold the page lock
|
||||
* since clear_page_dirty_for_io() and keep it held until we add the
|
||||
* request to the fi->writepages list and increment ap->num_folios.
|
||||
* After this fuse_page_is_writeback() will indicate that the page is
|
||||
* under writeback, so we can release the page lock.
|
||||
*/
|
||||
if (data->wpa == NULL) {
|
||||
err = -ENOMEM;
|
||||
wpa = fuse_writepage_args_setup(folio, data->ff);
|
||||
if (!wpa) {
|
||||
folio_put(tmp_folio);
|
||||
if (!wpa)
|
||||
goto out_unlock;
|
||||
}
|
||||
fuse_file_get(wpa->ia.ff);
|
||||
data->max_folios = 1;
|
||||
ap = &wpa->ia.ap;
|
||||
}
|
||||
folio_start_writeback(folio);
|
||||
|
||||
fuse_writepage_args_page_fill(wpa, folio, tmp_folio, ap->num_folios);
|
||||
data->orig_folios[ap->num_folios] = folio;
|
||||
fuse_writepage_args_page_fill(wpa, folio, ap->num_folios);
|
||||
data->nr_pages += folio_nr_pages(folio);
|
||||
|
||||
err = 0;
|
||||
if (data->wpa) {
|
||||
/*
|
||||
* Protected by fi->lock against concurrent access by
|
||||
* fuse_page_is_writeback().
|
||||
*/
|
||||
spin_lock(&fi->lock);
|
||||
ap->num_folios++;
|
||||
spin_unlock(&fi->lock);
|
||||
} else if (fuse_writepage_add(wpa, folio)) {
|
||||
ap->num_folios++;
|
||||
if (!data->wpa)
|
||||
data->wpa = wpa;
|
||||
} else {
|
||||
folio_end_writeback(folio);
|
||||
}
|
||||
out_unlock:
|
||||
folio_unlock(folio);
|
||||
|
||||
|
|
@ -2456,13 +2195,7 @@ static int fuse_writepages(struct address_space *mapping,
|
|||
data.inode = inode;
|
||||
data.wpa = NULL;
|
||||
data.ff = NULL;
|
||||
|
||||
err = -ENOMEM;
|
||||
data.orig_folios = kcalloc(fc->max_pages,
|
||||
sizeof(struct folio *),
|
||||
GFP_NOFS);
|
||||
if (!data.orig_folios)
|
||||
goto out;
|
||||
data.nr_pages = 0;
|
||||
|
||||
err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data);
|
||||
if (data.wpa) {
|
||||
|
|
@ -2472,7 +2205,6 @@ static int fuse_writepages(struct address_space *mapping,
|
|||
if (data.ff)
|
||||
fuse_file_put(data.ff, false);
|
||||
|
||||
kfree(data.orig_folios);
|
||||
out:
|
||||
return err;
|
||||
}
|
||||
|
|
@ -2497,8 +2229,6 @@ static int fuse_write_begin(struct file *file, struct address_space *mapping,
|
|||
if (IS_ERR(folio))
|
||||
goto error;
|
||||
|
||||
fuse_wait_on_page_writeback(mapping->host, folio->index);
|
||||
|
||||
if (folio_test_uptodate(folio) || len >= folio_size(folio))
|
||||
goto success;
|
||||
/*
|
||||
|
|
@ -2561,13 +2291,9 @@ static int fuse_launder_folio(struct folio *folio)
|
|||
{
|
||||
int err = 0;
|
||||
if (folio_clear_dirty_for_io(folio)) {
|
||||
struct inode *inode = folio->mapping->host;
|
||||
|
||||
/* Serialize with pending writeback for the same page */
|
||||
fuse_wait_on_page_writeback(inode, folio->index);
|
||||
err = fuse_writepage_locked(folio);
|
||||
if (!err)
|
||||
fuse_wait_on_page_writeback(inode, folio->index);
|
||||
folio_wait_writeback(folio);
|
||||
}
|
||||
return err;
|
||||
}
|
||||
|
|
@ -2611,7 +2337,7 @@ static vm_fault_t fuse_page_mkwrite(struct vm_fault *vmf)
|
|||
return VM_FAULT_NOPAGE;
|
||||
}
|
||||
|
||||
fuse_wait_on_folio_writeback(inode, folio);
|
||||
folio_wait_writeback(folio);
|
||||
return VM_FAULT_LOCKED;
|
||||
}
|
||||
|
||||
|
|
@ -3429,9 +3155,12 @@ static const struct address_space_operations fuse_file_aops = {
|
|||
void fuse_init_file_inode(struct inode *inode, unsigned int flags)
|
||||
{
|
||||
struct fuse_inode *fi = get_fuse_inode(inode);
|
||||
struct fuse_conn *fc = get_fuse_conn(inode);
|
||||
|
||||
inode->i_fop = &fuse_file_operations;
|
||||
inode->i_data.a_ops = &fuse_file_aops;
|
||||
if (fc->writeback_cache)
|
||||
mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
|
||||
|
||||
INIT_LIST_HEAD(&fi->write_files);
|
||||
INIT_LIST_HEAD(&fi->queued_writes);
|
||||
|
|
@ -3439,7 +3168,6 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
|
|||
fi->iocachectr = 0;
|
||||
init_waitqueue_head(&fi->page_waitq);
|
||||
init_waitqueue_head(&fi->direct_io_waitq);
|
||||
fi->writepages = RB_ROOT;
|
||||
|
||||
if (IS_ENABLED(CONFIG_FUSE_DAX))
|
||||
fuse_dax_inode_init(inode, flags);
|
||||
|
|
|
|||
|
|
@ -20,7 +20,6 @@ struct fuse_iqueue;
|
|||
struct fuse_forget_link;
|
||||
|
||||
struct fuse_copy_state {
|
||||
int write;
|
||||
struct fuse_req *req;
|
||||
struct iov_iter *iter;
|
||||
struct pipe_buffer *pipebufs;
|
||||
|
|
@ -30,8 +29,9 @@ struct fuse_copy_state {
|
|||
struct page *pg;
|
||||
unsigned int len;
|
||||
unsigned int offset;
|
||||
unsigned int move_pages:1;
|
||||
unsigned int is_uring:1;
|
||||
bool write:1;
|
||||
bool move_folios:1;
|
||||
bool is_uring:1;
|
||||
struct {
|
||||
unsigned int copied_sz; /* copied size into the user buffer */
|
||||
} ring;
|
||||
|
|
@ -51,7 +51,7 @@ struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique);
|
|||
|
||||
void fuse_dev_end_requests(struct list_head *head);
|
||||
|
||||
void fuse_copy_init(struct fuse_copy_state *cs, int write,
|
||||
void fuse_copy_init(struct fuse_copy_state *cs, bool write,
|
||||
struct iov_iter *iter);
|
||||
int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
|
||||
unsigned int argpages, struct fuse_arg *args,
|
||||
|
|
@ -64,7 +64,6 @@ void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req);
|
|||
bool fuse_remove_pending_req(struct fuse_req *req, spinlock_t *lock);
|
||||
|
||||
bool fuse_request_expired(struct fuse_conn *fc, struct list_head *list);
|
||||
bool fuse_fpq_processing_expired(struct fuse_conn *fc, struct list_head *processing);
|
||||
|
||||
#endif
|
||||
|
||||
|
|
|
|||
|
|
@ -74,8 +74,8 @@ extern struct list_head fuse_conn_list;
|
|||
extern struct mutex fuse_mutex;
|
||||
|
||||
/** Module parameters */
|
||||
extern unsigned max_user_bgreq;
|
||||
extern unsigned max_user_congthresh;
|
||||
extern unsigned int max_user_bgreq;
|
||||
extern unsigned int max_user_congthresh;
|
||||
|
||||
/* One forget request */
|
||||
struct fuse_forget_link {
|
||||
|
|
@ -161,9 +161,6 @@ struct fuse_inode {
|
|||
|
||||
/* waitq for direct-io completion */
|
||||
wait_queue_head_t direct_io_waitq;
|
||||
|
||||
/* List of writepage requestst (pending or sent) */
|
||||
struct rb_root writepages;
|
||||
};
|
||||
|
||||
/* readdir cache (directory only) */
|
||||
|
|
@ -636,6 +633,9 @@ struct fuse_conn {
|
|||
/** Number of fuse_dev's */
|
||||
atomic_t dev_count;
|
||||
|
||||
/** Current epoch for up-to-date dentries */
|
||||
atomic_t epoch;
|
||||
|
||||
struct rcu_head rcu;
|
||||
|
||||
/** The user id for this mount */
|
||||
|
|
|
|||
|
|
@ -41,7 +41,7 @@ unsigned int fuse_max_pages_limit = 256;
|
|||
unsigned int fuse_default_req_timeout;
|
||||
unsigned int fuse_max_req_timeout;
|
||||
|
||||
unsigned max_user_bgreq;
|
||||
unsigned int max_user_bgreq;
|
||||
module_param_call(max_user_bgreq, set_global_limit, param_get_uint,
|
||||
&max_user_bgreq, 0644);
|
||||
__MODULE_PARM_TYPE(max_user_bgreq, "uint");
|
||||
|
|
@ -49,7 +49,7 @@ MODULE_PARM_DESC(max_user_bgreq,
|
|||
"Global limit for the maximum number of backgrounded requests an "
|
||||
"unprivileged user can set");
|
||||
|
||||
unsigned max_user_congthresh;
|
||||
unsigned int max_user_congthresh;
|
||||
module_param_call(max_user_congthresh, set_global_limit, param_get_uint,
|
||||
&max_user_congthresh, 0644);
|
||||
__MODULE_PARM_TYPE(max_user_congthresh, "uint");
|
||||
|
|
@ -962,6 +962,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
|
|||
init_rwsem(&fc->killsb);
|
||||
refcount_set(&fc->count, 1);
|
||||
atomic_set(&fc->dev_count, 1);
|
||||
atomic_set(&fc->epoch, 1);
|
||||
init_waitqueue_head(&fc->blocked_waitq);
|
||||
fuse_iqueue_init(&fc->iq, fiq_ops, fiq_priv);
|
||||
INIT_LIST_HEAD(&fc->bg_queue);
|
||||
|
|
@ -1036,7 +1037,7 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc)
|
|||
}
|
||||
EXPORT_SYMBOL_GPL(fuse_conn_get);
|
||||
|
||||
static struct inode *fuse_get_root_inode(struct super_block *sb, unsigned mode)
|
||||
static struct inode *fuse_get_root_inode(struct super_block *sb, unsigned int mode)
|
||||
{
|
||||
struct fuse_attr attr;
|
||||
memset(&attr, 0, sizeof(attr));
|
||||
|
|
@ -1211,7 +1212,7 @@ static const struct super_operations fuse_super_operations = {
|
|||
.show_options = fuse_show_options,
|
||||
};
|
||||
|
||||
static void sanitize_global_limit(unsigned *limit)
|
||||
static void sanitize_global_limit(unsigned int *limit)
|
||||
{
|
||||
/*
|
||||
* The default maximum number of async requests is calculated to consume
|
||||
|
|
@ -1232,7 +1233,7 @@ static int set_global_limit(const char *val, const struct kernel_param *kp)
|
|||
if (rv)
|
||||
return rv;
|
||||
|
||||
sanitize_global_limit((unsigned *)kp->arg);
|
||||
sanitize_global_limit((unsigned int *)kp->arg);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -161,6 +161,7 @@ static int fuse_direntplus_link(struct file *file,
|
|||
struct fuse_conn *fc;
|
||||
struct inode *inode;
|
||||
DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
|
||||
int epoch;
|
||||
|
||||
if (!o->nodeid) {
|
||||
/*
|
||||
|
|
@ -190,6 +191,7 @@ static int fuse_direntplus_link(struct file *file,
|
|||
return -EIO;
|
||||
|
||||
fc = get_fuse_conn(dir);
|
||||
epoch = atomic_read(&fc->epoch);
|
||||
|
||||
name.hash = full_name_hash(parent, name.name, name.len);
|
||||
dentry = d_lookup(parent, &name);
|
||||
|
|
@ -256,6 +258,7 @@ retry:
|
|||
}
|
||||
if (fc->readdirplus_auto)
|
||||
set_bit(FUSE_I_INIT_RDPLUS, &get_fuse_inode(inode)->state);
|
||||
dentry->d_time = epoch;
|
||||
fuse_change_entry_timeout(dentry, o);
|
||||
|
||||
dput(dentry);
|
||||
|
|
@ -332,35 +335,32 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
|
|||
{
|
||||
int plus;
|
||||
ssize_t res;
|
||||
struct folio *folio;
|
||||
struct inode *inode = file_inode(file);
|
||||
struct fuse_mount *fm = get_fuse_mount(inode);
|
||||
struct fuse_conn *fc = fm->fc;
|
||||
struct fuse_io_args ia = {};
|
||||
struct fuse_args_pages *ap = &ia.ap;
|
||||
struct fuse_folio_desc desc = { .length = PAGE_SIZE };
|
||||
struct fuse_args *args = &ia.ap.args;
|
||||
void *buf;
|
||||
size_t bufsize = clamp((unsigned int) ctx->count, PAGE_SIZE, fc->max_pages << PAGE_SHIFT);
|
||||
u64 attr_version = 0, evict_ctr = 0;
|
||||
bool locked;
|
||||
|
||||
folio = folio_alloc(GFP_KERNEL, 0);
|
||||
if (!folio)
|
||||
buf = kvmalloc(bufsize, GFP_KERNEL);
|
||||
if (!buf)
|
||||
return -ENOMEM;
|
||||
|
||||
args->out_args[0].value = buf;
|
||||
|
||||
plus = fuse_use_readdirplus(inode, ctx);
|
||||
ap->args.out_pages = true;
|
||||
ap->num_folios = 1;
|
||||
ap->folios = &folio;
|
||||
ap->descs = &desc;
|
||||
if (plus) {
|
||||
attr_version = fuse_get_attr_version(fm->fc);
|
||||
evict_ctr = fuse_get_evict_ctr(fm->fc);
|
||||
fuse_read_args_fill(&ia, file, ctx->pos, PAGE_SIZE,
|
||||
FUSE_READDIRPLUS);
|
||||
fuse_read_args_fill(&ia, file, ctx->pos, bufsize, FUSE_READDIRPLUS);
|
||||
} else {
|
||||
fuse_read_args_fill(&ia, file, ctx->pos, PAGE_SIZE,
|
||||
FUSE_READDIR);
|
||||
fuse_read_args_fill(&ia, file, ctx->pos, bufsize, FUSE_READDIR);
|
||||
}
|
||||
locked = fuse_lock_inode(inode);
|
||||
res = fuse_simple_request(fm, &ap->args);
|
||||
res = fuse_simple_request(fm, args);
|
||||
fuse_unlock_inode(inode, locked);
|
||||
if (res >= 0) {
|
||||
if (!res) {
|
||||
|
|
@ -369,16 +369,14 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
|
|||
if (ff->open_flags & FOPEN_CACHE_DIR)
|
||||
fuse_readdir_cache_end(file, ctx->pos);
|
||||
} else if (plus) {
|
||||
res = parse_dirplusfile(folio_address(folio), res,
|
||||
file, ctx, attr_version,
|
||||
res = parse_dirplusfile(buf, res, file, ctx, attr_version,
|
||||
evict_ctr);
|
||||
} else {
|
||||
res = parse_dirfile(folio_address(folio), res, file,
|
||||
ctx);
|
||||
res = parse_dirfile(buf, res, file, ctx);
|
||||
}
|
||||
}
|
||||
|
||||
folio_put(folio);
|
||||
kvfree(buf);
|
||||
fuse_invalidate_atime(inode);
|
||||
return res;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -210,6 +210,7 @@ enum mapping_flags {
|
|||
AS_STABLE_WRITES = 7, /* must wait for writeback before modifying
|
||||
folio contents */
|
||||
AS_INACCESSIBLE = 8, /* Do not attempt direct R/W access to the mapping */
|
||||
AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
|
||||
/* Bits 16-25 are used for FOLIO_ORDER */
|
||||
AS_FOLIO_ORDER_BITS = 5,
|
||||
AS_FOLIO_ORDER_MIN = 16,
|
||||
|
|
@ -335,6 +336,16 @@ static inline bool mapping_inaccessible(struct address_space *mapping)
|
|||
return test_bit(AS_INACCESSIBLE, &mapping->flags);
|
||||
}
|
||||
|
||||
static inline void mapping_set_writeback_may_deadlock_on_reclaim(struct address_space *mapping)
|
||||
{
|
||||
set_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
|
||||
}
|
||||
|
||||
static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_space *mapping)
|
||||
{
|
||||
return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
|
||||
}
|
||||
|
||||
static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
|
||||
{
|
||||
return mapping->gfp_mask;
|
||||
|
|
|
|||
|
|
@ -232,6 +232,9 @@
|
|||
*
|
||||
* 7.43
|
||||
* - add FUSE_REQUEST_TIMEOUT
|
||||
*
|
||||
* 7.44
|
||||
* - add FUSE_NOTIFY_INC_EPOCH
|
||||
*/
|
||||
|
||||
#ifndef _LINUX_FUSE_H
|
||||
|
|
@ -267,7 +270,7 @@
|
|||
#define FUSE_KERNEL_VERSION 7
|
||||
|
||||
/** Minor version number of this interface */
|
||||
#define FUSE_KERNEL_MINOR_VERSION 43
|
||||
#define FUSE_KERNEL_MINOR_VERSION 44
|
||||
|
||||
/** The node ID of the root inode */
|
||||
#define FUSE_ROOT_ID 1
|
||||
|
|
@ -671,6 +674,7 @@ enum fuse_notify_code {
|
|||
FUSE_NOTIFY_RETRIEVE = 5,
|
||||
FUSE_NOTIFY_DELETE = 6,
|
||||
FUSE_NOTIFY_RESEND = 7,
|
||||
FUSE_NOTIFY_INC_EPOCH = 8,
|
||||
FUSE_NOTIFY_CODE_MAX,
|
||||
};
|
||||
|
||||
|
|
|
|||
12
mm/vmscan.c
12
mm/vmscan.c
|
|
@ -1197,8 +1197,10 @@ retry:
|
|||
* 2) Global or new memcg reclaim encounters a folio that is
|
||||
* not marked for immediate reclaim, or the caller does not
|
||||
* have __GFP_FS (or __GFP_IO if it's simply going to swap,
|
||||
* not to fs). In this case mark the folio for immediate
|
||||
* reclaim and continue scanning.
|
||||
* not to fs), or the folio belongs to a mapping where
|
||||
* waiting on writeback during reclaim may lead to a deadlock.
|
||||
* In this case mark the folio for immediate reclaim and
|
||||
* continue scanning.
|
||||
*
|
||||
* Require may_enter_fs() because we would wait on fs, which
|
||||
* may not have submitted I/O yet. And the loop driver might
|
||||
|
|
@ -1223,6 +1225,8 @@ retry:
|
|||
* takes to write them to disk.
|
||||
*/
|
||||
if (folio_test_writeback(folio)) {
|
||||
mapping = folio_mapping(folio);
|
||||
|
||||
/* Case 1 above */
|
||||
if (current_is_kswapd() &&
|
||||
folio_test_reclaim(folio) &&
|
||||
|
|
@ -1233,7 +1237,9 @@ retry:
|
|||
/* Case 2 above */
|
||||
} else if (writeback_throttling_sane(sc) ||
|
||||
!folio_test_reclaim(folio) ||
|
||||
!may_enter_fs(folio, sc->gfp_mask)) {
|
||||
!may_enter_fs(folio, sc->gfp_mask) ||
|
||||
(mapping &&
|
||||
mapping_writeback_may_deadlock_on_reclaim(mapping))) {
|
||||
/*
|
||||
* This is slightly racy -
|
||||
* folio_end_writeback() might have
|
||||
|
|
|
|||
Loading…
Reference in New Issue