Commit Graph

1292 Commits (cb53a93e33e108bfec00a13cd12696e1c0ccc8b6)

Author SHA1 Message Date
Linus Torvalds 96890bc2ea NFS client updates for Linux 5.14
Highlights include:
 
 Stable fixes:
 - Two sunrpc fixes for deadlocks involving privileged rpc_wait_queues
 
 Bugfixes
 - SUNRPC: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base()
 - SUNRPC: prevent port reuse on transports which don't request it.
 - NFSv3: Fix memory leak in posix_acl_create()
 - NFS: Various fixes to attribute revalidation timeouts
 - NFSv4: Fix handling of non-atomic change attribute updates
 - NFSv4: If a server is down, don't cause mounts to other servers to
   hang as well
 - pNFS: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT
 - NFS: Fix mount failures due to incorrect setting of the has_sec_mnt_opts
   filesystem flag
  - NFS: Ensure nfs_readpage returns promptly when an internal error occurs
  - NFS: Fix fscache read from NFS after cache error
  - pNFS: Various bugfixes around the LAYOUTGET operation
 
 Features
 - Multiple patches to add support for fcntl() leases over NFSv4.
 - A sysfs interface to display more information about the various
   transport connections used by the RPC client
 - A sysfs interface to allow a suitably privileged user to offline a
   transport that may no longer point to a valid server
 - A sysfs interface to allow a suitably privileged user to change the
   server IP address used by the RPC client
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmDnPrkACgkQZwvnipYK
 APKGAw/9EjdGoic6VpShQyb5uxaRoDd4uwWgFLBOfPhIWC7qNMtkj49wHIOEUm7e
 YfdF5RlCdmshaoMxjY84wjl8NTMwHbPahgooDd4+UsZUs2qxZ8dBsr0itfbFsJv8
 BpaCYKQt6XGQngGrWfC7SiCETnMej2YsmjDfHvhD58TxnRfPWexHUvx9xi9uGRCS
 sIWRA2QMNs7LwdShkkRotagodRLhu/zo4g0lon5lI8D/SRg6o8RoO4YP6oKH1FN4
 OyVzy1aWZGocgwCMUtNeuigJSRyDa+bJTfJ2c27uw5g18s0XWZ3j2DxD5I+HCEuE
 B4rhg+ujtPIifYLHf2Aj3nlxdBePZ5L67a2MOOUo+wSD+nPmNMZF1eIT/3Jsg/HA
 Z8gqcBiTIkBfVGJxWWbrbHfxPXQiK1IRGQx9acyhLCN9M6Kv5bbkn4R4dnronvJR
 g6O968fgC5uvl60CXdc8NCpWtSitXB/nH8pn7MbJ8JBGq7QIYNkS0d4E8ePhYwxk
 sRYJt21O+ryjodfQDHaUxodzCKGcpRoknpirMmgoAp4zdkva4ltViNsQvHa7jFh8
 HIuhU6Aia1xVYpUMDEXf2WMXCT9yLa2TyMDuS5KDfb69wBkQJWeKNkebf+1k03wQ
 saEmdoP4aEEujimkA7rqyOlI8XhsudKvBd3HXg+w9+xIt4yoie0=
 =NaOI
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - Multiple patches to add support for fcntl() leases over NFSv4.

   - A sysfs interface to display more information about the various
     transport connections used by the RPC client

   - A sysfs interface to allow a suitably privileged user to offline a
     transport that may no longer point to a valid server

   - A sysfs interface to allow a suitably privileged user to change the
     server IP address used by the RPC client

  Stable fixes:

   - Two sunrpc fixes for deadlocks involving privileged rpc_wait_queues

  Bugfixes:

   - SUNRPC: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base()

   - SUNRPC: prevent port reuse on transports which don't request it.

   - NFSv3: Fix memory leak in posix_acl_create()

   - NFS: Various fixes to attribute revalidation timeouts

   - NFSv4: Fix handling of non-atomic change attribute updates

   - NFSv4: If a server is down, don't cause mounts to other servers to
     hang as well

   - pNFS: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT

   - NFS: Fix mount failures due to incorrect setting of the
     has_sec_mnt_opts filesystem flag

   - NFS: Ensure nfs_readpage returns promptly when an internal error
     occurs

   - NFS: Fix fscache read from NFS after cache error

   - pNFS: Various bugfixes around the LAYOUTGET operation"

* tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits)
  NFSv4/pNFS: Return an error if _nfs4_pnfs_v3_ds_connect can't load NFSv3
  NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times
  NFSv4/pnfs: Clean up layout get on open
  NFSv4/pnfs: Fix layoutget behaviour after invalidation
  NFSv4/pnfs: Fix the layout barrier update
  NFS: Fix fscache read from NFS after cache error
  NFS: Ensure nfs_readpage returns promptly when internal error occurs
  sunrpc: remove an offlined xprt using sysfs
  sunrpc: provide showing transport's state info in the sysfs directory
  sunrpc: display xprt's queuelen of assigned tasks via sysfs
  sunrpc: provide multipath info in the sysfs directory
  NFSv4.1 identify and mark RPC tasks that can move between transports
  sunrpc: provide transport info in the sysfs directory
  SUNRPC: take a xprt offline using sysfs
  sunrpc: add dst_attr attributes to the sysfs xprt directory
  SUNRPC for TCP display xprt's source port in sysfs xprt_info
  SUNRPC query transport's source port
  SUNRPC display xprt's main value in sysfs's xprt_info
  SUNRPC mark the first transport
  sunrpc: add add sysfs directory per xprt under each xprt_switch
  ...
2021-07-09 09:43:57 -07:00
Olga Kornievskaia 6f081693e7 sunrpc: remove an offlined xprt using sysfs
Once a transport has been put offline, this transport can be also
removed from the list of transports. Any tasks that have been stuck
on this transport would find the next available active transport
and be re-tried. This transport would be removed from the xprt_switch
list and freed.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia 85e39feead NFSv4.1 identify and mark RPC tasks that can move between transports
In preparation for when we can re-try a task on a different transport,
identify and mark such RPC tasks as moveable. Only 4.1+ operarations can
be re-tried on a different transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia 5b7eb78486 SUNRPC: take a xprt offline using sysfs
Using sysfs's xprt_state attribute, mark a particular transport offline.
It will not be picked during the round-robin selection. It's not allowed
to take the main (1st created transport associated with the rpc_client)
offline. Also bring a transport back online via sysfs by writing "online"
and that would allow for this transport to be picked during the round-
robin selection.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia a8482488a7 SUNRPC query transport's source port
Provide ability to query transport's source port.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia e091853ebd SUNRPC mark the first transport
When an RPC client gets created it's first transport is special
and should be marked a main transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia 587bc7255d sunrpc: add dst_attr attributes to the sysfs xprt directory
Allow to query and set the destination's address of a transport.
Setting of the destination address is allowed only for TCP or RDMA
based connections.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia d408ebe04a sunrpc: add add sysfs directory per xprt under each xprt_switch
Add individual transport directories under each transport switch
group. For instance, for each nconnect=X connections there will be
a transport directory. Naming conventions also identifies transport
type -- xprt-<id>-<type> where type is udp, tcp, rdma, local, bc.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Olga Kornievskaia baea99445d sunrpc: add xprt_switch direcotry to sunrpc's sysfs
Add xprt_switch directory to the sysfs and create individual
xprt_swith subdirectories for multipath transport group.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Olga Kornievskaia d3abc73987 sunrpc: keep track of the xprt_class in rpc_xprt structure
We need to keep track of the type for a given transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Olga Kornievskaia 5b9268727f sunrpc: add IDs to multipath
This is used to uniquely identify sunrpc multipath objects in /sys.

Signed-off-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Olga Kornievskaia 572caba402 sunrpc: add xprt id
This adds a unique identifier for a sunrpc transport in sysfs, which is
similarly managed to the unique IDs of clients.

Signed-off-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Olga Kornievskaia c5a382ebdb sunrpc: Create per-rpc_clnt sysfs kobjects
These will eventually have files placed under them for sysfs operations.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Andy Shevchenko 4c52729377 kernel.h: split out kstrtox() and simple_strtox() to a separate header
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out kstrtox() and
simple_strtox() helpers.

At the same time convert users in header and lib folders to use new
header.  Though for time being include new header back to kernel.h to
avoid twisted indirected includes for existing users.

[andy.shevchenko@gmail.com: fix documentation references]
  Link: https://lkml.kernel.org/r/20210615220003.377901-1-andy.shevchenko@gmail.com

Link: https://lkml.kernel.org/r/20210611185815.44103-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Francis Laniel <laniel_francis@privacyrequired.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Kars Mulder <kerneldev@karsmulder.nl>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:05 -07:00
Trond Myklebust e86be3a04b SUNRPC: More fixes for backlog congestion
Ensure that we fix the XPRT_CONGESTED starvation issue for RDMA as well
as socket based transports.
Ensure we always initialise the request after waking up from the backlog
list.

Fixes: e877a88d1f ("SUNRPC in case of backlog, hand free slots directly to waiting task")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-26 06:36:13 -04:00
Linus Torvalds a647034fe2 NFS client updates for Linux 5.13
Highlights include:
 
 Stable fixes:
 - Add validation of the UDP retrans parameter to prevent shift out-of-bounds
 - Don't discard pNFS layout segments that are marked for return
 
 Bugfixes:
 - Fix a NULL dereference crash in xprt_complete_bc_request() when the
   NFSv4.1 server misbehaves.
 - Fix the handling of NFS READDIR cookie verifiers
 - Sundry fixes to ensure attribute revalidation works correctly when the
   server does not return post-op attributes.
 - nfs4_bitmask_adjust() must not change the server global bitmasks
 - Fix major timeout handling in the RPC code.
 - NFSv4.2 fallocate() fixes.
 - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling
 - Copy offload attribute revalidation fixes
 - Fix an incorrect filehandle size check in the pNFS flexfiles driver
 - Fix several RDMA transport setup/teardown races
 - Fix several RDMA queue wrapping issues
 - Fix a misplaced memory read barrier in sunrpc's call_decode()
 
 Features:
 - Micro optimisation of the TCP transmission queue using TCP_CORK
 - statx() performance improvements by further splitting up the tracking
   of invalid cached file metadata.
 - Support the NFSv4.2 "change_attr_type" attribute and use it to
   optimise handling of change attribute updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmCVLooACgkQZwvnipYK
 APJB5BAAtIJyhx40ooMBzcucDmXd1qovlKsb8ZlvnSI6c7wvHhFPNk9z4zwThnjL
 FpVYzJzK6XzAQY/PtgbrPwnSUmW925ngPWYR/hiYe+OGPBnYV+tXP8izCyEkNgMg
 45goDOxojGWl7AGTuAJiKcDSdH9PyIrbvt28iwcNSGjslasGSbAoL/836l4OIGr1
 Ymxs/NDML11dPco8GIKLGtHd8leFGleDx089VeNsgud8MdaFErp16O5Iz8DdzRKd
 W1l2zDMb05j8eDZIfy3w3FyrLkDXA+KgLSADiC8TcpxoadPaQJMeCvoIq8oqVndn
 bZBoxduXdLgf54Aec0WnNKFAOyc7pGvZoSNmFouT7EGV73g+g1LQ+ZbEE1bb8fCQ
 XHqCVaBt2+47NiTUgdxjXlZRfcn8fYKx0tVxfG3mQVMXUAWfsjmMyQMNgijDRJI2
 8Wz3lZMRGMILbR9j4QpP1biVy/2zGNWG/TB5ZZyZMSY4uT+aOpzlqdknb4UsRaSp
 f7MfmB7xEWpS4DJr9RIBrJ/hIdnMu1mNInxDPFo5Kl5HNp4TaPm2dPir2ZD2wMZI
 daURTX7giUhpE15ZebQDBqWD+mTR0bVDqLLeo131JRmMfMEHugNrr49xe+NkBu/R
 QWnFzgkGdQsOeiKRRwEUuhsi74JspqfwzdZzHqcRM5WuXVvBLcA=
 =h01b
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - Add validation of the UDP retrans parameter to prevent shift
     out-of-bounds

   - Don't discard pNFS layout segments that are marked for return

  Bugfixes:

   - Fix a NULL dereference crash in xprt_complete_bc_request() when the
     NFSv4.1 server misbehaves.

   - Fix the handling of NFS READDIR cookie verifiers

   - Sundry fixes to ensure attribute revalidation works correctly when
     the server does not return post-op attributes.

   - nfs4_bitmask_adjust() must not change the server global bitmasks

   - Fix major timeout handling in the RPC code.

   - NFSv4.2 fallocate() fixes.

   - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling

   - Copy offload attribute revalidation fixes

   - Fix an incorrect filehandle size check in the pNFS flexfiles driver

   - Fix several RDMA transport setup/teardown races

   - Fix several RDMA queue wrapping issues

   - Fix a misplaced memory read barrier in sunrpc's call_decode()

  Features:

   - Micro optimisation of the TCP transmission queue using TCP_CORK

   - statx() performance improvements by further splitting up the
     tracking of invalid cached file metadata.

   - Support the NFSv4.2 'change_attr_type' attribute and use it to
     optimise handling of change attribute updates"

* tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (85 commits)
  xprtrdma: Fix a NULL dereference in frwr_unmap_sync()
  sunrpc: Fix misplaced barrier in call_decode
  NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code.
  xprtrdma: Move fr_mr field to struct rpcrdma_mr
  xprtrdma: Move the Work Request union to struct rpcrdma_mr
  xprtrdma: Move fr_linv_done field to struct rpcrdma_mr
  xprtrdma: Move cqe to struct rpcrdma_mr
  xprtrdma: Move fr_cid to struct rpcrdma_mr
  xprtrdma: Remove the RPC/RDMA QP event handler
  xprtrdma: Don't display r_xprt memory addresses in tracepoints
  xprtrdma: Add an rpcrdma_mr_completion_class
  xprtrdma: Add tracepoints showing FastReg WRs and remote invalidation
  xprtrdma: Avoid Send Queue wrapping
  xprtrdma: Do not wake RPC consumer on a failed LocalInv
  xprtrdma: Do not recycle MR after FastReg/LocalInv flushes
  xprtrdma: Clarify use of barrier in frwr_wc_localinv_done()
  xprtrdma: Rename frwr_release_mr()
  xprtrdma: rpcrdma_mr_pop() already does list_del_init()
  xprtrdma: Delete rpcrdma_recv_buffer_put()
  xprtrdma: Fix cwnd update ordering
  ...
2021-05-07 11:23:41 -07:00
Trond Myklebust d737e5d418 SUNRPC: Set TCP_CORK until the transmit queue is empty
When we have multiple RPC requests queued up, it makes sense to set the
TCP_CORK option while the transmit queue is non-empty.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-05 09:04:20 -04:00
Chuck Lever 5533c4f4b9 svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_arg
These fields are no longer used.

The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on
x86_64, down from 2440 bytes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-31 15:57:48 -04:00
Chuck Lever 9af723be86 svcrdma: Remove sc_read_complete_q
Now that svc_rdma_recvfrom() waits for Read completion,
sc_read_complete_q is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-31 15:57:48 -04:00
Chuck Lever 7dcfbd86ad SUNRPC: Export svc_xprt_received()
Prepare svc_xprt_received() to be called from transport code instead
of from generic RPC server code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever 579900670a svcrdma: Remove unused sc_pages field
Clean up. This significantly reduces the size of struct
svc_rdma_send_ctxt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever 2a1e4f21d8 svcrdma: Normalize Send page handling
Currently svc_rdma_sendto() migrates xdr_buf pages into a separate
page list and NULLs out a bunch of entries in rq_pages while the
pages are under I/O. The Send completion handler then frees those
pages later.

Instead, let's wait for the Send completion, then handle page
releasing in the nfsd thread. I'd like to avoid the cost of 250+
put_page() calls in the Send completion handler, which is single-
threaded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever e844d307d4 svcrdma: Add a "deferred close" helper
Refactor a bit of commonly used logic so that every site that wants
a close deferred to an nfsd thread does all the right things
(set_bit(XPT_CLOSE) then enqueue).

Also, once XPT_CLOSE is set on a transport, it is never cleared. If
XPT_CLOSE is already set, then the close is already being handled
and the enqueue can be skipped.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever c558d47596 svcrdma: Maintain a Receive water mark
Post more Receives when the number of pending Receives drops below
a water mark. The batch mechanism is disabled if the underlying
device cannot support a reasonably-sized Receive Queue.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 13:22:13 -04:00
Chuck Lever ded04a587f NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 10:18:55 -04:00
Chuck Lever cc9bcdad77 NFSD: Update the NFSv3 READ3res encode to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 10:18:53 -04:00
Chuck Lever bddfdbcddb NFSD: Extract the svcxdr_init_encode() helper
NFSD initializes an encode xdr_stream only after the RPC layer has
already inserted the RPC Reply header. Thus it behaves differently
than xdr_init_encode does, which assumes the passed-in xdr_buf is
entirely devoid of content.

nfs4proc.c has this server-side stream initialization helper, but
it is visible only to the NFSv4 code. Move this helper to a place
that can be accessed by NFSv2 and NFSv3 server XDR functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22 10:18:51 -04:00
Chuck Lever bade4be69a svcrdma: Revert "svcrdma: Reduce Receive doorbell rate"
I tested commit 43042b90ca ("svcrdma: Reduce Receive doorbell
rate") with mlx4 (IB) and software iWARP and didn't find any
issues. However, I recently got my hardware iWARP setup back on
line (FastLinQ) and it's crashing hard on this commit (confirmed
via bisect).

The failure mode is complex.
 - After a connection is established, the first Receive completes
   normally.
 - But the second and third Receives have garbage in their Receive
   buffers. The server responds with ERR_VERS as a result.
 - When the client tears down the connection to retry, a couple
   of posted Receives flush twice, and that corrupts the recv_ctxt
   free list.
 - __svc_rdma_free then faults or loops infinitely while destroying
   the xprt's recv_ctxts.

Since 43042b90ca ("svcrdma: Reduce Receive doorbell rate") does
not fix a bug but is a scalability enhancement, it's safe and
appropriate to revert it while working on a replacement.

Fixes: 43042b90ca ("svcrdma: Reduce Receive doorbell rate")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-11 15:26:07 -05:00
Linus Torvalds 7c70f3a748 Optimization:
- Cork the socket while there are queued replies
 
 Fixes:
 
 - DRC shutdown ordering
 - svc_rdma_accept() lockdep splat
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmAsA80ACgkQM2qzM29m
 f5erXA/+MrR3ZtwK2eaTITu13TzzTrMURbp/n0wCCW/Ls1YMb6bn9ggtBwu2W5Cn
 Vb0RO9OLcmoI6CjqPh0CTUvvZspMYOAX4W1jQecKt2ml075APdlqUcv9YWPUQqVJ
 qTg8HxDymvHvY3I3FcBxhzofmGzF8AOmQZJw9uI5Wt/ivBfqGWcAGlxyRmB3mdsm
 cJRK0Sy7QMn2LefMcpMEeSbPA049/NZNRp6fcXnpPQFer42thoosYsNhTlAJfCXC
 C5S0z3/T6rpuJucV9la/WkpUA0YhWbPEHWNdAB5tzSqmoEo4LpzJzjv7uyQU4oue
 QlmChIz9qasgTI/BnCkBIzPD99S4UQcXjX0BnNinkQ77e6+b/vdAR+T+NLHJdkAf
 +7Xz6T9aZNaz2R49CjYl6/kG0rlNkjUzyURRYs/9zEBhogMPH/N4T7Z2M+ljCkeb
 tc3OaFDXZ2rfr7EKBGsfnEKINM1gpYipzILkr8GSHUMZLzOB/64upKySaJVjCGXj
 7Sf1w+vJUWwYc+FqFvbaR4ybr01VIfdsecpn1TtY870zG1JzimzAHVZk1/xC9+CX
 J+lVOXbjawDl1Et3V3fWq6Y7mhAWves/NKPcbSug9sFc4qRHEmPbAq/RRtlsjQcn
 foMr5R8qd8OwEamVypZ2nIFxq4q3b742AS8lZhaK+DyZKq3oLac=
 =+R4U
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull more nfsd updates from Chuck Lever:
 "Here are a few additional NFSD commits for the merge window:

 Optimization:
   - Cork the socket while there are queued replies

  Fixes:
   - DRC shutdown ordering
   - svc_rdma_accept() lockdep splat"

* tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Further clean up svc_tcp_sendmsg()
  SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg()
  SUNRPC: Use TCP_CORK to optimise send performance on the server
  svcrdma: Hold private mutex while invoking rdma_accept()
  nfsd: register pernet ops last, unregister first
2021-02-22 13:29:55 -08:00
Linus Torvalds 99f1a5872b Highlights:
- Update NFSv2 and NFSv3 XDR decoding functions
 - Further improve support for re-exporting NFS mounts
 - Convert NFSD stats to per-CPU counters
 - Add batch Receive posting to the server's RPC/RDMA transport
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmAYVsAACgkQM2qzM29m
 f5f1Lg/+IBC7Bhnnc8jNr4nv4IntCwwKdx2VzSzQszbN/kkhLZK89u36nZyqp0RB
 Vg3olyS5DseEisMMx0rI0KkHBz7pz+kXVdOGvve8fHBZvewnJ/FpxNZPChG4aMDc
 mfjHLvDHO0/GoUqSftrBrjSEJ2jHoNdDcmvzgdAlugTuLOjGX3HhmKa3ZYVTNgFn
 kDmFMaEHjS3pb3LqNDHNIYYpNnvtIukxHUh9weDvr+AH8Rmt/WVfjDc26xBS0FQu
 jDJUk9AP06VYgZx0dLKp4In8GJYwz9DNjNrWm91+RyJml9AWrFswdBHHcfi0W/Yy
 GipkBZGYE6ZblyMlITZCB4etyHQsq7qLuqicTlcXjL/Fdkd7xlT8DwFlZ8LjpyCU
 LeHTI2cGzRSJ/JjL2hvhPvT3gR5hln/qk17jSP7V4S6psZAqAEvw/Xa/+MDJhB/b
 vnzltFPvEgZc59Q/SJLbaWZLHy1q0enbrOBLMZDmUlk911/tgAuflHJM60N8o732
 vkfy05pvZlrV0cFY546pQd7zTKZcAOYPVHHoP25wPa2ibKBu6eQ6kZEi5zu+tVK3
 CkvqIhePFspBMQ6GOPKixTiFV4KFoO1HBtk+JEeMkiHXHk1xATCWbg1m7wkaagsq
 NNS/qFkLRnftGYpFViBaxTFBGxiBOSbsTIS/zfj5L7JOpW4FRD4=
 =02xw
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:

 - Update NFSv2 and NFSv3 XDR decoding functions

 - Further improve support for re-exporting NFS mounts

 - Convert NFSD stats to per-CPU counters

 - Add batch Receive posting to the server's RPC/RDMA transport

* tag 'nfsd-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (65 commits)
  nfsd: skip some unnecessary stats in the v4 case
  nfs: use change attribute for NFS re-exports
  NFSv4_2: SSC helper should use its own config.
  nfsd: cstate->session->se_client -> cstate->clp
  nfsd: simplify nfsd4_check_open_reclaim
  nfsd: remove unused set_client argument
  nfsd: find_cpntf_state cleanup
  nfsd: refactor set_client
  nfsd: rename lookup_clientid->set_client
  nfsd: simplify nfsd_renew
  nfsd: simplify process_lock
  nfsd4: simplify process_lookup1
  SUNRPC: Correct a comment
  svcrdma: DMA-sync the receive buffer in svc_rdma_recvfrom()
  svcrdma: Reduce Receive doorbell rate
  svcrdma: Deprecate stat variables that are no longer used
  svcrdma: Restore read and write stats
  svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter
  svcrdma: Convert rdma_stat_recv to a per-CPU counter
  svcrdma: Refactor svc_rdma_init() and svc_rdma_clean_up()
  ...
2021-02-21 10:22:20 -08:00
Trond Myklebust e0a912e8dd SUNRPC: Use TCP_CORK to optimise send performance on the server
Use a counter to keep track of how many requests are queued behind the
xprt->xpt_mutex, and keep TCP_CORK set until the queue is empty.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Link: https://lore.kernel.org/linux-nfs/20210213202532.23146-1-trondmy@kernel.org/T/#u
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-02-16 12:32:31 -05:00
Dave Wysochanski ba6dfce47c SUNRPC: Move simple_get_bytes and simple_get_netobj into private header
Remove duplicated helper functions to parse opaque XDR objects
and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h.
In the new file carry the license and copyright from the source file
net/sunrpc/auth_gss/auth_gss.c.  Finally, update the comment inside
include/linux/sunrpc/xdr.h since lockd is not the only user of
struct xdr_netobj.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-25 15:59:12 -05:00
Chuck Lever 43042b90ca svcrdma: Reduce Receive doorbell rate
This is similar to commit e340c2d6ef ("xprtrdma: Reduce the
doorbell rate (Receive)") which added Receive batching to the
client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever c6226ff9a6 svcrdma: Deprecate stat variables that are no longer used
Clean up. We are not permitted to remove old proc files. Instead,
convert these variables to stubs that are only ever allowed to
display a value of zero.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever 1e7e557316 svcrdma: Restore read and write stats
Now that we have an efficient mechanism to update these two stats,
let's start maintaining them again.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever 22df5a2246 svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter
Avoid the overhead of a memory bus lock cycle for counting a value
that is hardly every used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever df971cd853 svcrdma: Convert rdma_stat_recv to a per-CPU counter
Receives are frequent events. Avoid the overhead of a memory bus
lock cycle for counting a value that is hardly every used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:28 -05:00
Chuck Lever 81d2174743 SUNRPC: Move definition of XDR_UNIT
Clean up: The unit of XDR alignment is defined by RFC 4506,
not as part of the RPC message header. Thus it belongs in
include/linux/sunrpc/xdr.h.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:23 -05:00
Chuck Lever 2289e87b59 SUNRPC: Make trace_svc_process() display the RPC procedure symbolically
The next few patches will employ these strings to help make server-
side trace logs more human-readable. A similar technique is already
in use in kernel RPC client code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:23 -05:00
Linus Torvalds 74f602dc96 NFS client updates for Linux 5.11
Highlights include:
 
 Features:
 - NFSv3: Add emulation of lookupp() to improve open_by_filehandle()
   support.
 - A series of patches to improve readdir performance, particularly with
   large directories.
 - Basic support for using NFS/RDMA with the pNFS files and flexfiles
   drivers.
 - Micro-optimisations for RDMA.
 - RDMA tracing improvements.
 
 Bugfixes:
 - Fix a long standing bug with xs_read_xdr_buf() when receiving partial
   pages (Dan Aloni).
 - Various fixes for getxattr and listxattr, when used over non-TCP
   transports.
 - Fixes for containerised NFS from Sargun Dhillon.
 - switch nfsiod to be an UNBOUND workqueue (Neil Brown).
 - READDIR should not ask for security label information if there is no
   LSM policy. (Olga Kornievskaia)
 - Avoid using interval-based rebinding with TCP in lockd (Calum Mackay).
 - A series of RPC and NFS layer fixes to support the NFSv4.2 READ_PLUS code.
 - A couple of fixes for pnfs/flexfiles read failover
 
 Cleanups:
 - Various cleanups for the SUNRPC xdr code in conjunction with the
   READ_PLUS fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl/aiaIACgkQZwvnipYK
 APIOihAAvONscxrFSaGRh2ICNv9I/zXW/A5+R3qnkESPVLTqTPJVphoN7FlINAr1
 B74pg6n4T4viycbvsogU2+kHrlJZO7B8lTkJL7ynm9Wgyw8+2Ga4QEn1bsAoqmuY
 b91p/+LfOLKrYeeojoH31PC73uOYYG1WHXJhjq0l9b5CTgThWpj6O3gDaFEbFvmz
 A7V3yqSp04sV70YxUhwelBHZ5BXdiXIKsPnIwvXXHuY7IcamrE4EA3wGCwtxkBnu
 4dwbOtRXURNSev0r3n6FsH4wZl+/nvp9UpnGdPtVv94F1zm2JKLwkhoJejS/vpjq
 eyKc7ZXBQ0uHbTWI2Yj1YjA61VIUO0R0EDuyTAnRKDeaarID42n5kMG7J8cIglZR
 jQfyx99xm0eSrdwxC09tcRL/lBzYcOfc6pJo5P9BtaFtRvbp9iFIHuFKlrXbULd4
 WrZzDMhiKVYGSTcTpfQyVoK2rCvn6W1Ida4iYeI0gkJ1v9X90UhbtJOyggn/bxyL
 DV/Qy40+l48n7CZfPU2eDv4WXqjKGRibpDoWMBLwUH20dDEX6kKYv3BfApFYGqyO
 /GTPAFUZarCy8BENvzZv/Jb9mt5pDQM5p9ZXpdUOhydLMMA+pauaT/Gr+pAHPIPx
 MPj546Gh2cEaT883xvRrJmQTG0nw/WscPNcHaJcgL5oYltmuwck=
 =IKWG
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - NFSv3: Add emulation of lookupp() to improve open_by_filehandle()
     support

   - A series of patches to improve readdir performance, particularly
     with large directories

   - Basic support for using NFS/RDMA with the pNFS files and flexfiles
     drivers

   - Micro-optimisations for RDMA

   - RDMA tracing improvements

  Bugfixes:

   - Fix a long standing bug with xs_read_xdr_buf() when receiving
     partial pages (Dan Aloni)

   - Various fixes for getxattr and listxattr, when used over non-TCP
     transports

   - Fixes for containerised NFS from Sargun Dhillon

   - switch nfsiod to be an UNBOUND workqueue (Neil Brown)

   - READDIR should not ask for security label information if there is
     no LSM policy (Olga Kornievskaia)

   - Avoid using interval-based rebinding with TCP in lockd (Calum
     Mackay)

   - A series of RPC and NFS layer fixes to support the NFSv4.2
     READ_PLUS code

   - A couple of fixes for pnfs/flexfiles read failover

  Cleanups:

   - Various cleanups for the SUNRPC xdr code in conjunction with the
     READ_PLUS fixes"

* tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits)
  NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()
  pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read
  NFSv4/pnfs: Add tracing for the deviceid cache
  fs/lockd: convert comma to semicolon
  NFSv4.2: fix error return on memory allocation failure
  NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet
  NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow
  NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow
  NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer
  NFSv4.2: decode_read_plus_hole() needs to check the extent offset
  NFSv4.2: decode_read_plus_data() must skip padding after data segment
  NFSv4.2: Ensure we always reset the result->count in decode_read_plus()
  SUNRPC: When expanding the buffer, we may need grow the sparse pages
  SUNRPC: Cleanup - constify a number of xdr_buf helpers
  SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field
  SUNRPC: _copy_to/from_pages() now check for zero length
  SUNRPC: Cleanup xdr_shrink_bufhead()
  SUNRPC: Fix xdr_expand_hole()
  SUNRPC: Fixes for xdr_align_data()
  SUNRPC: _shift_data_left/right_pages should check the shift length
  ...
2020-12-17 12:15:03 -08:00
Trond Myklebust f8d0e60f10 SUNRPC: Cleanup - constify a number of xdr_buf helpers
There are a number of xdr helpers for struct xdr_buf that do not change
the structure itself. Mark those as taking const pointers for
documentation purposes.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-14 06:51:07 -05:00
Trond Myklebust c4f2f591f0 SUNRPC: Fix xdr_expand_hole()
We do want to try to grow the buffer if possible, but if that attempt
fails, we still want to move the data and truncate the XDR message.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-14 06:51:07 -05:00
Trond Myklebust 9a20f6f4e6 SUNRPC: Fixes for xdr_align_data()
The main use case right now for xdr_align_data() is to shift the page
data to the left, and in practice shrink the total XDR data buffer.
This patch ensures that we fix up the accounting for the buffer length
as we shift that data around.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-14 06:51:07 -05:00
Trond Myklebust c87b056e58 SUNRPC: Remove unused function xprt_load_transport()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02 14:05:53 -05:00
Trond Myklebust 1fc5f13186 SUNRPC: Add a helper to return the transport identifier given a netid
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02 14:05:52 -05:00
Trond Myklebust d5aa6b22e2 SUNRPC: xprt_load_transport() needs to support the netid "rdma6"
According to RFC5666, the correct netid for an IPv6 addressed RDMA
transport is "rdma6", which we've supported as a mount option since
Linux-4.7. The problem is when we try to load the module "xprtrdma6",
that will fail, since there is no modulealias of that name.

Fixes: 181342c5eb ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02 14:05:52 -05:00
Chuck Lever 8918cc0d2b NFSD: Add helper for decoding locker4
Refactor for clarity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 14:46:38 -05:00
Chuck Lever cbd9abb370 NFSD: Replace READ* macros in nfsd4_decode_commit()
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 14:46:36 -05:00
Chuck Lever c1346a1216 NFSD: Replace the internals of the READ_BUF() macro
Convert the READ_BUF macro in nfs4xdr.c from open code to instead
use the new xdr_stream-style decoders already in use by the encode
side (and by the in-kernel NFS client implementation). Once this
conversion is done, each individual NFSv4 argument decoder can be
independently cleaned up to replace these macros with C code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 14:46:36 -05:00
Chuck Lever 5191955d6f SUNRPC: Prepare for xdr_stream-style decoding on the server-side
A "permanent" struct xdr_stream is allocated in struct svc_rqst so
that it is usable by all server-side decoders. A per-rqst scratch
buffer is also allocated to handle decoding XDR data items that
cross page boundaries.

To demonstrate how it will be used, add the first call site for the
new svcxdr_init_decode() API.

As an additional part of the overall conversion, add symbolic
constants for successful and failed XDR operations. Returning "0" is
overloaded. Sometimes it means something failed, but sometimes it
means success. To make it more clear when XDR decoding functions
succeed or fail, introduce symbolic constants.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-11-30 14:46:35 -05:00