Commit Graph

4011 Commits (c4b9570cfb63501638db720f3bee9f6dfd044b82)

Author SHA1 Message Date
Linus Torvalds 96c84703f1 drm next for 6.14-rc1
core:
 - device memory cgroup controller added
 - Remove driver date from drm_driver
 - Add drm_printer based hex dumper
 - drm memory stats docs update
 - scheduler documentation improvements
 
 new driver:
 - amdxdna - Ryzen AI NPU support
 
 connector:
 - add a mutex to protect ELD
 - make connector setup two-step
 
 panels:
 - Introduce backlight quirks infrastructure
 - New panels: KDB KD116N2130B12, Tianma TM070JDHG34-00,
 - Multi-Inno Technology MI1010Z1T-1CP11
 
 bridge:
 - ti-sn65dsi83: Add ti,lvds-vod-swing optional properties
 - Provide default implementation of atomic_check for HDMI bridges
 - it605: HDCP improvements, MCCS Support
 
 xe:
 - make OA buffer size configurable
 - GuC capture fixes
 - add ufence and g2h flushes
 - restore system memory GGTT mappings
 - ioctl fixes
 - SRIOV PF scheduling priority
 - allow fault injection
 - lots of improvements/refactors
 - Enable GuC's WA_DUAL_QUEUE for newer platforms
 - IRQ related fixes and improvements
 
 i915:
 - More accurate engine busyness metrics with GuC submission
 - Ensure partial BO segment offset never exceeds allowed max
 - Flush GuC CT receive tasklet during reset preparation
 - Some DG2 refactor to fix DG2 bugs when operating with certain CPUs
 - Fix DG1 power gate sequence
 - Enabling uncompressed 128b/132b UHBR SST
 - Handle hdmi connector init failures, and no HDMI/DP cases
 - More robust engine resets on Haswell and older
 
 i915/xe display:
 - HDCP fixes for Xe3Lpd
 - New GSC FW ARL-H/ARL-U
 - support 3 VDSC engines 12 slices
 - MBUS joining sanitisation
 - reconcile i915/xe display power mgmt
 - Xe3Lpd fixes
 - UHBR rates for Thunderbolt
 
 amdgpu:
 - DRM panic support
 - track BO memory stats at runtime
 - Fix max surface handling in DC
 - Cleaner shader support for gfx10.3 dGPUs
 - fix drm buddy trim handling
 - SDMA engine reset updates
 - Fix doorbell ttm cleanup
 - RAS updates
 - ISP updates
 - SDMA queue reset support
 - Rework DPM powergating interfaces
 - Documentation updates and cleanups
 - DCN 3.5 updates
 - Use a pm notifier to more gracefully handle VRAM eviction on suspend or hibernate
 - Add debugfs interfaces for forcing scheduling to specific engine instances
 - GG 9.5 updates
 - IH 4.4 updates
 - Make missing optional firmware less noisy
 - PSP 13.x updates
 - SMU 13.x updates
 - VCN 5.x updates
 - JPEG 5.x updates
 - GC 12.x updates
 - DC FAMS updates
 
 amdkfd:
 - GG 9.5 updates
 - Logging improvements
 - Shader debugger fixes
 - Trap handler cleanup
 - Cleanup includes
 - Eviction fence wq fix
 
 msm:
 - MDSS:
 - properly described UBWC registers
 - added SM6150 (aka QCS615) support
 - DPU:
 - added SM6150 (aka QCS615) support
 - enabled wide planes if virtual planes are enabled (by using two SSPPs for a single plane)
 - added CWB hardware blocks support
 - DSI:
 - added SM6150 (aka QCS615) support
 - GPU:
 - Print GMU core fw version
 - GMU bandwidth voting for a740 and a750
 - Expose uche trap base via uapi
 - UAPI error reporting
 
 rcar-du:
 - Add r8a779h0 Support
 
 ivpu:
 - Fix qemu crash when using passthrough
 
 nouveau:
 - expose GSP-RM logging buffers via debugfs
 
 panfrost:
 - Add MT8188 Mali-G57 MC3 support
 
 rockchip:
 - Gamma LUT support
 
 hisilicon:
 - new HIBMC support
 
 virtio-gpu:
 - convert to helpers
 - add prime support for scanout buffers
 
 v3d:
 - Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
 
 vc4:
 - Add support for BCM2712
 
 vkms:
 - line-per-line compositing algorithm to improve performance
 
 zynqmp:
 - Add DP audio support
 
 mediatek:
 - dp: Add sdp path reset
 - dp: Support flexible length of DP calibration data
 
 etnaviv:
 - add fdinfo memory support
 - add explicit reset handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmeJ5qYACgkQDHTzWXnE
 hr4o+w/9EbijDfyf8GCj4Qaxov8nZ3KEMW8LLmrYO3epfLsniX+nv01oNdbRXBjl
 QcsKixAvkyfLl61RuPnwbYiSJfxgwZ5K8rke7cshwlMB7zl7xZ+GZRoAmJlnokS4
 uhmclCriW5nfKRNAGUPcj/ReGZeyHwqvGZn3jyuShkIFpE4rDope4DQsTzm/zs/i
 +cKyRAFm86EIdTACr9DVtb1L5uNZOnHDkufRH5EZr/7CWFco1krLxb/r4cvFaiIO
 GiDaLvXKXKwzQ6NeIWWCEU2zTBz0BluI8ggxp1+WlDiYgLDWtCBpBNPAoNJO/iQS
 J+E8bsk2b/aCLSJQgxcK0y80CXpoJyALaqStdHUqxuWv3/o0g8lFUJlfJVCNPIsg
 o4mBkdbgkzkHCPxUbie7uQIx+2DIsEiwWC/YGBeRx49qEYsLWyFHf6JR8j9aHCQq
 eGanaubzR+W2AC81yktd3rcxpmX5kq8n6ax3ZtS9wnio8iyB5jBDM8QeFSAE/vXV
 B5TT1nneh+HXJ6bTwZBFXkiq2JRxUdbZIS5oQLh0zixVthBMISSsYhJ222nH1bC4
 DWIS2ggqSgqkb0WsE29CJyhJ1fPmS3v7lBXqPvjmN5vMto4gGOJAEgT6CiDpGFIz
 zXzNfrirr1r95iSST4PnYVOOkfK3t9gvbWMXgkr0wygtxyoxHzk=
 =5FIc
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "There are two external interactions of note, the msm tree pull in some
  opp tree, hopefully the opp tree arrives from the same git tree
  however it normally does.

  There is also a new cgroup controller for device memory, that is used
  by drm, so is merging through my tree. This will hopefully help open
  up gpu cgroup usage a bit more and move us forward.

  There is a new accelerator driver for the AMD XDNA Ryzen AI NPUs.

  Then the usual xe/amdgpu/i915/msm leaders and lots of changes and
  refactors across the board:

  core:
   - device memory cgroup controller added
   - Remove driver date from drm_driver
   - Add drm_printer based hex dumper
   - drm memory stats docs update
   - scheduler documentation improvements

  new driver:
   - amdxdna - Ryzen AI NPU support

  connector:
   - add a mutex to protect ELD
   - make connector setup two-step

  panels:
   - Introduce backlight quirks infrastructure
   - New panels: KDB KD116N2130B12, Tianma TM070JDHG34-00,
   - Multi-Inno Technology MI1010Z1T-1CP11

  bridge:
   - ti-sn65dsi83: Add ti,lvds-vod-swing optional properties
   - Provide default implementation of atomic_check for HDMI bridges
   - it605: HDCP improvements, MCCS Support

  xe:
   - make OA buffer size configurable
   - GuC capture fixes
   - add ufence and g2h flushes
   - restore system memory GGTT mappings
   - ioctl fixes
   - SRIOV PF scheduling priority
   - allow fault injection
   - lots of improvements/refactors
   - Enable GuC's WA_DUAL_QUEUE for newer platforms
   - IRQ related fixes and improvements

  i915:
   - More accurate engine busyness metrics with GuC submission
   - Ensure partial BO segment offset never exceeds allowed max
   - Flush GuC CT receive tasklet during reset preparation
   - Some DG2 refactor to fix DG2 bugs when operating with certain CPUs
   - Fix DG1 power gate sequence
   - Enabling uncompressed 128b/132b UHBR SST
   - Handle hdmi connector init failures, and no HDMI/DP cases
   - More robust engine resets on Haswell and older

  i915/xe display:
   - HDCP fixes for Xe3Lpd
   - New GSC FW ARL-H/ARL-U
   - support 3 VDSC engines 12 slices
   - MBUS joining sanitisation
   - reconcile i915/xe display power mgmt
   - Xe3Lpd fixes
   - UHBR rates for Thunderbolt

  amdgpu:
   - DRM panic support
   - track BO memory stats at runtime
   - Fix max surface handling in DC
   - Cleaner shader support for gfx10.3 dGPUs
   - fix drm buddy trim handling
   - SDMA engine reset updates
   - Fix doorbell ttm cleanup
   - RAS updates
   - ISP updates
   - SDMA queue reset support
   - Rework DPM powergating interfaces
   - Documentation updates and cleanups
   - DCN 3.5 updates
   - Use a pm notifier to more gracefully handle VRAM eviction on
     suspend or hibernate
   - Add debugfs interfaces for forcing scheduling to specific engine
     instances
   - GG 9.5 updates
   - IH 4.4 updates
   - Make missing optional firmware less noisy
   - PSP 13.x updates
   - SMU 13.x updates
   - VCN 5.x updates
   - JPEG 5.x updates
   - GC 12.x updates
   - DC FAMS updates

  amdkfd:
   - GG 9.5 updates
   - Logging improvements
   - Shader debugger fixes
   - Trap handler cleanup
   - Cleanup includes
   - Eviction fence wq fix

  msm:
   - MDSS:
      - properly described UBWC registers
      - added SM6150 (aka QCS615) support
   - DPU:
      - added SM6150 (aka QCS615) support
      - enabled wide planes if virtual planes are enabled (by using two
        SSPPs for a single plane)
      - added CWB hardware blocks support
   - DSI:
      - added SM6150 (aka QCS615) support
   - GPU:
      - Print GMU core fw version
      - GMU bandwidth voting for a740 and a750
      - Expose uche trap base via uapi
      - UAPI error reporting

  rcar-du:
   - Add r8a779h0 Support

  ivpu:
   - Fix qemu crash when using passthrough

  nouveau:
   - expose GSP-RM logging buffers via debugfs

  panfrost:
   - Add MT8188 Mali-G57 MC3 support

  rockchip:
   - Gamma LUT support

  hisilicon:
   - new HIBMC support

  virtio-gpu:
   - convert to helpers
   - add prime support for scanout buffers

  v3d:
   - Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL

  vc4:
   - Add support for BCM2712

  vkms:
   - line-per-line compositing algorithm to improve performance

  zynqmp:
   - Add DP audio support

  mediatek:
   - dp: Add sdp path reset
   - dp: Support flexible length of DP calibration data

  etnaviv:
   - add fdinfo memory support
   - add explicit reset handling"

* tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel: (1070 commits)
  drm/bridge: fix documentation for the hdmi_audio_prepare() callback
  doc/cgroup: Fix title underline length
  drm/doc: Include new drm-compute documentation
  cgroup/dmem: Fix parameters documentation
  cgroup/dmem: Select PAGE_COUNTER
  kernel/cgroup: Remove the unused variable climit
  drm/display: hdmi: Do not read EDID on disconnected connectors
  drm/tests: hdmi: Add connector disablement test
  drm/connector: hdmi: Do atomic check when necessary
  drm/amd/display: 3.2.316
  drm/amd/display: avoid reset DTBCLK at clock init
  drm/amd/display: improve dpia pre-train
  drm/amd/display: Apply DML21 Patches
  drm/amd/display: Use HW lock mgr for PSR1
  drm/amd/display: Revised for Replay Pseudo vblank control
  drm/amd/display: Add a new flag for replay low hz
  drm/amd/display: Remove unused read_ono_state function from Hwss module
  drm/amd/display: Do not elevate mem_type change to full update
  drm/amd/display: Do not wait for PSR disable on vbl enable
  drm/amd/display: Remove unnecessary eDP power down
  ...
2025-01-21 16:09:47 -08:00
Linus Torvalds 0eb4aaa230 for-6.14-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmeHvVQACgkQxWXV+ddt
 WDsJ6w//cPqI8tf3kMxurZcG7clJRIIISotPrC6hm3UDNpJLa7HDaVJ50FAoIhMV
 sB4RQNZky4mfB6ypXxmETzV3ZHvP0+oFgRs72Ommi0ZbdnBgxhaUTrDXLKl52o4r
 UoeqvRKReEYOesN09rPXYPwytUOkxHU/GjNzv7bC/Tzvq/xKaIN5qMYZwkHtJ8PK
 JtCFypfbmDPNDJz37l0BhRya2oMtpcUtxM9uP8RWVuQtaELgjcy56W/+osoyJTy9
 FSKaoWUPsDVDufnILlGR8Kub2Z5mcISVqyARUdr/q3j5CDfyTdQvahmUy7sHgUAe
 HGh5QBdRJu1QTvdZw+nK4YCaYpK6Nj4liDtO1cwVitde5RXsJrt6kYBLlY/kU2Qr
 KODOloM/zVKxULR0ARl11NULZquUsczP6Wxfn+dtyDJ3JGlY9OcuESmorHoUtkMX
 75Tj1AtRMNcfZAE2HquL1Oz3bIMcg4btDJsC+9Yp5K11SP12XpOwC42k/9Bx3iBe
 Iki0BSuppFqX5MMY3OEWzD1pz2vOGYR8ISD6EIsjpjl2vBeRwydaCCZfuszSC7gl
 Y4goSdwFMPVlqllL1h27XUjKVXvttCqqdB6P28MbvZKnFAPlm189BJQZC5cbHAJU
 ceBww5PvI9QxnJnFG5iOLcnko6liUWPP9l2c5LLtUsJIi8B5Hu0=
 =SXLv
 -----END PGP SIGNATURE-----

Merge tag 'for-6.14-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "User visible changes, features:

   - rebuilding of the free space tree at mount time is done in more
     transactions, fix potential hangs when the transaction thread is
     blocked due to large amount of block groups

   - more read IO balancing strategies (experimental config), add two
     new ways how to select a device for read if the profiles allow that
     (all RAID1*), the current default selects the device by pid which
     is good on average but less performant for single reader workloads

       - select preferred device for all reads (namely for testing)
       - round-robin, balance reads across devices relevant for the
         requested IO range

   - add encoded write ioctl support to io_uring (read was added in
     6.12), basis for writing send stream using that instead of
     syscalls, non-blocking mode is not yet implemented

   - support FS_IOC_READ_VERITY_METADATA, applications can use the
     metadata to do their own verification

   - pass inode's i_write_hint to bios, for parity with other
     filesystems, ioctls F_GET_RW_HINT/F_SET_RW_HINT

  Core:

   - in zoned mode: allow to directly reclaim a block group by simply
     resetting it, then it can be reused and another block group does
     not need to be allocated

   - super block validation now also does more comprehensive sys array
     validation, adding it to the points where superblock is validated
     (post-read, pre-write)

   - subpage mode fixes:
      - fix double accounting of blocks due to some races
      - improved or fixed error handling in a few cases (compression,
        delalloc)

   - raid stripe tree:
      - fix various cases with extent range splitting or deleting
      - implement hole punching to extent range
      - reduce number of stripe tree lookups during bio submission
      - more self-tests

   - updated self-tests (delayed refs)

   - error handling improvements

   - cleanups, refactoring
      - remove rest of backref caching infrastructure from relocation,
        not needed anymore
      - error message updates
      - remove unnecessary calls when extent buffer was marked dirty
      - unused parameter removal
      - code moved to new files

  Other code changes: add rb_find_add_cached() to the rb-tree API"

* tag 'for-6.14-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (127 commits)
  btrfs: selftests: add a selftest for deleting two out of three extents
  btrfs: selftests: add test for punching a hole into 3 RAID stripe-extents
  btrfs: selftests: add selftest for punching holes into the RAID stripe extents
  btrfs: selftests: test RAID stripe-tree deletion spanning two items
  btrfs: selftests: don't split RAID extents in half
  btrfs: selftests: check for correct return value of failed lookup
  btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents
  btrfs: implement hole punching for RAID stripe extents
  btrfs: fix deletion of a range spanning parts two RAID stripe extents
  btrfs: fix tail delete of RAID stripe-extents
  btrfs: fix front delete range calculation for RAID stripe extents
  btrfs: assert RAID stripe-extent length is always greater than 0
  btrfs: don't try to delete RAID stripe-extents if we don't need to
  btrfs: selftests: correct RAID stripe-tree feature flag setting
  btrfs: add io_uring interface for encoded writes
  btrfs: remove the unused locked_folio parameter from btrfs_cleanup_ordered_extents()
  btrfs: add extra error messages for delalloc range related errors
  btrfs: subpage: dump the involved bitmap when ASSERT() failed
  btrfs: subpage: fix the bitmap dump of the locked flags
  btrfs: do proper folio cleanup when run_delalloc_nocow() failed
  ...
2025-01-20 13:09:30 -08:00
Linus Torvalds b971424b6e vfs-6.14-rc1.afs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ4pS3AAKCRCRxhvAZXjc
 okSwAPkB8Ra+oTplB/yzmab5kFB0+IUSHAiBfG6TCYb45op7wgEAs4+ignZkb+Bi
 PsrfV7soiTGNUYSDVKOw7LS6PJEzkgA=
 =3mcq
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.14-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull afs updates from Christian Brauner:
 "Dynamic root improvements:

   - Create an /afs/.<cell> mountpoint to match the /afs/<cell>
     mountpoint when a cell is created

   - Add some more checks on cell names proposed by the user to prevent
     dodgy symlink bodies from being created. Also prevent rootcell from
     being altered once set to simplify the locking

   - Change the handling of /afs/@cell from being a dentry name
     substitution at lookup time to making it a symlink to the current
     cell name and also provide a /afs/.@cell symlink to point to the
     dotted cell mountpoint

  Fixes:

   - Fix the abort code check in the fallback handling for the
     YFS.RemoveFile2 RPC call

   - Use call->op->server() for oridnary filesystem RPC calls that have
     an operation descriptor instead of call->server()"

* tag 'vfs-6.14-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  afs: Fix the fallback handling for the YFS.RemoveFile2 RPC call
  afs: Make /afs/@cell and /afs/.@cell symlinks
  afs: Add rootcell checks
  afs: Make /afs/.<cell> as well as /afs/<cell> mountpoints
2025-01-20 11:40:48 -08:00
Linus Torvalds ca56a74a31 vfs-6.14-rc1.netfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ4pRKQAKCRCRxhvAZXjc
 ov2dAQCULWjTBWdF8Ro2bfNeXzWvUUnSPjoLJ9B4xlrOB9c2MAEAiwkKHkzAxUco
 hCvaRJc3H2ze2wrgbIABPKB2noQVVwk=
 =4ojv
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs netfs updates from Christian Brauner:
 "This contains read performance improvements and support for monolithic
  single-blob objects that have to be read/written as such (e.g. AFS
  directory contents). The implementation of the two parts is interwoven
  as each makes the other possible.

   - Read performance improvements

     The read performance improvements are intended to speed up some
     loss of performance detected in cifs and to a lesser extend in afs.

     The problem is that we queue too many work items during the
     collection of read results: each individual subrequest is collected
     by its own work item, and then they have to interact with each
     other when a series of subrequests don't exactly align with the
     pattern of folios that are being read by the overall request.

     Whilst the processing of the pages covered by individual
     subrequests as they complete potentially allows folios to be woken
     in parallel and with minimum delay, it can shuffle wakeups for
     sequential reads out of order - and that is the most common I/O
     pattern.

     The final assessment and cleanup of an operation is then held up
     until the last I/O completes - and for a synchronous sequential
     operation, this means the bouncing around of work items just adds
     latency.

     Two changes have been made to make this work:

     (1) All collection is now done in a single "work item" that works
         progressively through the subrequests as they complete (and
         also dispatches retries as necessary).

     (2) For readahead and AIO, this work item be done on a workqueue
         and can run in parallel with the ultimate consumer of the data;
         for synchronous direct or unbuffered reads, the collection is
         run in the application thread and not offloaded.

     Functions such as smb2_readv_callback() then just tell netfslib
     that the subrequest has terminated; netfslib does a minimal bit of
     processing on the spot - stat counting and tracing mostly - and
     then queues/wakes up the worker. This simplifies the logic as the
     collector just walks sequentially through the subrequests as they
     complete and walks through the folios, if buffered, unlocking them
     as it goes. It also keeps to a minimum the amount of latency
     injected into the filesystem's low-level I/O handling

     The way netfs supports filesystems using the deprecated
     PG_private_2 flag is changed: folios are flagged and added to a
     write request as they complete and that takes care of scheduling
     the writes to the cache. The originating read request can then just
     unlock the pages whatever happens.

   - Single-blob object support

     Single-blob objects are files for which the content of the file
     must be read from or written to the server in a single operation
     because reading them in parts may yield inconsistent results. AFS
     directories are an example of this as there exists the possibility
     that the contents are generated on the fly and would differ between
     reads or might change due to third party interference.

     Such objects will be written to and retrieved from the cache if one
     is present, though we allow/may need to propose multiple
     subrequests to do so. The important part is that read from/write to
     the *server* is monolithic.

     Single blob reading is, for the moment, fully synchronous and does
     result collection in the application thread and, also for the
     moment, the API is supplied the buffer in the form of a folio_queue
     chain rather than using the pagecache.

   - Related afs changes

     This series makes a number of changes to the kafs filesystem,
     primarily in the area of directory handling:

      - AFS's FetchData RPC reply processing is made partially
        asynchronous which allows the netfs_io_request's outstanding
        operation counter to be removed as part of reducing the
        collection to a single work item.

      - Directory and symlink reading are plumbed through netfslib using
        the single-blob object API and are now cacheable with fscache.
        This also allows the afs_read struct to be eliminated and
        netfs_io_subrequest to be used directly instead.

      - Directory and symlink content are now stored in a folio_queue
        buffer rather than in the pagecache. This means we don't require
        the RCU read lock and xarray iteration to access it, and folios
        won't randomly disappear under us because the VM wants them
        back.

      - The vnode operation lock is changed from a mutex struct to a
        private lock implementation. The problem is that the lock now
        needs to be dropped in a separate thread and mutexes don't
        permit that.

      - When a new directory or symlink is created, we now initialise it
        locally and mark it valid rather than downloading it (we know
        what it's likely to look like).

      - We now use the in-directory hashtable to reduce the number of
        entries we need to scan when doing a lookup. The edit routines
        have to maintain the hash chains.

      - Cancellation (e.g. by signal) of an async call after the
        rxrpc_call has been set up is now offloaded to the worker thread
        as there will be a notification from rxrpc upon completion. This
        avoids a double cleanup.

   - A "rolling buffer" implementation is created to abstract out the
     two separate folio_queue chaining implementations I had (one for
     read and one for write).

   - Functions are provided to create/extend a buffer in a folio_queue
     chain and tear it down again.

     This is used to handle AFS directories, but could also be used to
     create bounce buffers for content crypto and transport crypto.

   - The was_async argument is dropped from netfs_read_subreq_terminated()

     Instead we wake the read collection work item by either queuing it
     or waking up the app thread.

   - We don't need to use BH-excluding locks when communicating between
     the issuing thread and the collection thread as neither of them now
     run in BH context.

   - Also included are a number of new tracepoints; a split of the
     netfslib write collection code to put retrying into its own file
     (it gets more complicated with content encryption).

   - There are also some minor fixes AFS included, including fixing the
     AFS directory format struct layout, reducing some directory
     over-invalidation and making afs_mkdir() translate EEXIST to
     ENOTEMPY (which is not available on all systems the servers
     support).

   - Finally, there's a patch to try and detect entry into the folio
     unlock function with no folio_queue structs in the buffer (which
     isn't allowed in the cases that can get there).

     This is a debugging patch, but should be minimal overhead"

* tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
  netfs: Report on NULL folioq in netfs_writeback_unlock_folios()
  afs: Add a tracepoint for afs_read_receive()
  afs: Locally initialise the contents of a new symlink on creation
  afs: Use the contained hashtable to search a directory
  afs: Make afs_mkdir() locally initialise a new directory's content
  netfs: Change the read result collector to only use one work item
  afs: Make {Y,}FS.FetchData an asynchronous operation
  afs: Fix cleanup of immediately failed async calls
  afs: Eliminate afs_read
  afs: Use netfslib for symlinks, allowing them to be cached
  afs: Use netfslib for directories
  afs: Make afs_init_request() get a key if not given a file
  netfs: Add support for caching single monolithic objects such as AFS dirs
  netfs: Add functions to build/clean a buffer in a folio_queue
  afs: Add more tracepoints to do with tracking validity
  cachefiles: Add auxiliary data trace
  cachefiles: Add some subrequest tracepoints
  netfs: Remove some extraneous directory invalidations
  afs: Fix directory format encoding struct
  afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY
  ...
2025-01-20 09:29:11 -08:00
Linus Torvalds fda5e3f284 Fix regression in GFP output in trace events
It was reported that the GFP flags in trace events went from human
 readable to just their hex values:
 
   gfp_flags=GFP_HIGHUSER_MOVABLE|__GFP_COMP to gfp_flags=0x140cca
 
 This was caused by a change that added the use of enums in calculating
 the GFP flags. As defines get translated into their values in the
 trace event format files, the user space tooling could easily convert
 the GFP flags into their symbols via the __print_flags() helper macro.
 The problem is that enums do not get converted, and the names of the
 enums show up in the format files and user space tooling cannot translate
 them.
 
 Add TRACE_DEFINE_ENUM() around the enums used for GFP flags which is the
 tracing infrastructure macro that informs the tracing subsystem what
 the values for enums and it can then expose that to user space.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ4u7AxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgIkAP0VVW80Ck5K9hpDJ3SvSgaGDntSegY7
 lI0ExVqGsJz8GQEAzkaRjgGXuXfzGzA9K7ZUe9X4R8W0Xkl9GisvqqEU1Ak=
 =rzFM
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.13-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix regression in GFP output in trace events

  It was reported that the GFP flags in trace events went from human
  readable to just their hex values:

      gfp_flags=GFP_HIGHUSER_MOVABLE|__GFP_COMP to gfp_flags=0x140cca

  This was caused by a change that added the use of enums in calculating
  the GFP flags.

  As defines get translated into their values in the trace event format
  files, the user space tooling could easily convert the GFP flags into
  their symbols via the __print_flags() helper macro.

  The problem is that enums do not get converted, and the names of the
  enums show up in the format files and user space tooling cannot
  translate them.

  Add TRACE_DEFINE_ENUM() around the enums used for GFP flags which is
  the tracing infrastructure macro that informs the tracing subsystem
  what the values for enums and it can then expose that to user space"

* tag 'trace-v6.13-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: gfp: Fix the GFP enum values shown for user space tracing tools
2025-01-18 13:22:53 -08:00
Steven Rostedt 60295b944f tracing: gfp: Fix the GFP enum values shown for user space tracing tools
Tracing tools like perf and trace-cmd read the /sys/kernel/tracing/events/*/*/format
files to know how to parse the data and also how to print it. For the
"print fmt" portion of that file, if anything uses an enum that is not
exported to the tracing system, user space will not be able to parse it.

The GFP flags use to be defines, and defines get translated in the print
fmt sections. But now they are converted to use enums, which is not.

The mm_page_alloc trace event format use to have:

  print fmt: "page=%p pfn=0x%lx order=%d migratetype=%d gfp_flags=%s",
    REC->pfn != -1UL ? (((struct page *)vmemmap_base) + (REC->pfn)) : ((void
    *)0), REC->pfn != -1UL ? REC->pfn : 0, REC->order, REC->migratetype,
    (REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {( unsigned
    long)(((((((( gfp_t)(0x400u|0x800u)) | (( gfp_t)0x40u) | (( gfp_t)0x80u) |
    (( gfp_t)0x100000u)) | (( gfp_t)0x02u)) | (( gfp_t)0x08u) | (( gfp_t)0)) |
    (( gfp_t)0x40000u) | (( gfp_t)0x80000u) | (( gfp_t)0x2000u)) & ~((
    gfp_t)(0x400u|0x800u))) | (( gfp_t)0x400u)), "GFP_TRANSHUGE"}, {( unsigned
    long)((((((( gfp_t)(0x400u|0x800u)) | (( gfp_t)0x40u) | (( gfp_t)0x80u) |
    (( gfp_t)0x100000u)) | (( gfp_t)0x02u)) | (( gfp_t)0x08u) | (( gfp_t)0)) ...

Where the GFP values are shown and not their names. But after the GFP
flags were converted to use enums, it has:

  print fmt: "page=%p pfn=0x%lx order=%d migratetype=%d gfp_flags=%s",
    REC->pfn != -1UL ? (vmemmap + (REC->pfn)) : ((void *)0), REC->pfn != -1UL
    ? REC->pfn : 0, REC->order, REC->migratetype, (REC->gfp_flags) ?
    __print_flags(REC->gfp_flags, "|", {( unsigned long)((((((((
    gfp_t)(((((1UL))) << (___GFP_DIRECT_RECLAIM_BIT))|((((1UL))) <<
    (___GFP_KSWAPD_RECLAIM_BIT)))) | (( gfp_t)((((1UL))) << (___GFP_IO_BIT)))
    | (( gfp_t)((((1UL))) << (___GFP_FS_BIT))) | (( gfp_t)((((1UL))) <<
    (___GFP_HARDWALL_BIT)))) | (( gfp_t)((((1UL))) << (___GFP_HIGHMEM_BIT))))
    | (( gfp_t)((((1UL))) << (___GFP_MOVABLE_BIT))) | (( gfp_t)0)) | ((
    gfp_t)((((1UL))) << (___GFP_COMP_BIT))) ...

Where the enums names like ___GFP_KSWAPD_RECLAIM_BIT are shown and not their
values. User space has no way to convert these names to their values and
the output will fail to parse. What is shown is now:

  mm_page_alloc:  page=0xffffffff981685f3 pfn=0x1d1ac1 order=0 migratetype=1 gfp_flags=0x140cca

The TRACE_DEFINE_ENUM() macro was created to handle enums in the print fmt
files. This causes them to be replaced at boot up with the numbers, so
that user space tooling can parse it. By using this macro, the output is
back to the human readable:

  mm_page_alloc: page=0xffffffff981685f3 pfn=0x122233 order=0 migratetype=1 gfp_flags=GFP_HIGHUSER_MOVABLE|__GFP_COMP

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Veronika  Molnarova <vmolnaro@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20250116214438.749504792@goodmis.org
Reported-by: Michael Petlan <mpetlan@redhat.com>
Closes: https://lore.kernel.org/all/87be5f7c-1a0-dad-daa0-54e342efaea7@redhat.com/
Fixes: 772dd03427 ("mm: enumerate all gfp flags")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-17 16:15:39 -05:00
Naohiro Aota 453a73c306 btrfs: zoned: reclaim unused zone by zone resetting
On the zoned mode, once used and freed region is still not reusable after the
freeing. The underlying zone needs to be reset before reusing. Btrfs resets a
zone when it removes a block group, and then new block group is allocated on
the zones to reuse the zones. But, it is sometime too late to catch up with a
write side.

This commit introduces a new space-info reclaim method ZONE_RESET. That will
pick a block group from the unused list and reset its zone to reuse the
zone_unusable space. It is faster than removing the block group and re-creating
a new block group on the same zones.

For the first implementation, the ZONE_RESET is only applied to a block group
whose region is fully zone_unusable. Reclaiming partial zone_unusable block
group could be implemented later.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13 14:53:14 +01:00
Muchun Song 9ab96b524d hugetlb: fix NULL pointer dereference in trace_hugetlbfs_alloc_inode
hugetlb_file_setup() will pass a NULL @dir to hugetlbfs_get_inode(), so we
will access a NULL pointer for @dir.  Fix it and set __entry->dr to 0 if
@dir is NULL.  Because ->i_ino cannot be 0 (see get_next_ino()), there is
no confusing if user sees a 0 inode number.

Link: https://lkml.kernel.org/r/20250106033118.4640-1-songmuchun@bytedance.com
Fixes: 318580ad7f ("hugetlbfs: support tracepoint")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reported-by: Cheung Wall <zzqq0103.hey@gmail.com>
Closes: https://lore.kernel.org/linux-mm/02858D60-43C1-4863-A84F-3C76A8AF1F15@linux.dev/T/#
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Cc: cheung wall <zzqq0103.hey@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12 19:03:36 -08:00
David Howells 30bca65bbb
afs: Make /afs/@cell and /afs/.@cell symlinks
Make /afs/@cell a symlink in the /afs dynamic root to match what other AFS
clients do rather than doing a substitution in the dentry name.  This has
the bonus of being tab-expandable also.

Further, provide a /afs/.@cell symlink to point to the dotted cell share.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20250107183454.608451-4-dhowells@redhat.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-10 14:54:08 +01:00
David Howells 3c49e529e1
afs: Add a tracepoint for afs_read_receive()
Add a tracepoint for afs_read_receive() to allow potential missed wakeups
to be debugged.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-32-dhowells@redhat.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:09 +01:00
David Howells 836bb70bde
afs: Make afs_mkdir() locally initialise a new directory's content
Initialise a new directory's content when it is created by mkdir locally
rather than downloading the content from the server as we can predict what
it's going to look like.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-29-dhowells@redhat.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:09 +01:00
David Howells e2d46f2ec3
netfs: Change the read result collector to only use one work item
Change the way netfslib collects read results to do all the collection for
a particular read request using a single work item that walks along the
subrequest queue as subrequests make progress or complete, unlocking folios
progressively rather than doing the unlock in parallel as parallel requests
come in.

The code is remodelled to be more like the write-side code, though only
using a single stream.  This makes it more directly comparable and thus
easier to duplicate fixes between the two sides.

This has a number of advantages:

 (1) It's simpler.  There doesn't need to be a complex donation mechanism
     to handle mismatches between the size and alignment of subrequests and
     folios.  The collector unlocks folios as the subrequests covering each
     complete.

 (2) It should cause less scheduler overhead as there's a single work item
     in play unlocking pages in parallel when a read gets split up into a
     lot of subrequests instead of one per subrequest.

     Whilst the parallellism is nice in theory, in practice, the vast
     majority of loads are sequential reads of the whole file, so
     committing a bunch of threads to unlocking folios out of order doesn't
     help in those cases.

 (3) It should make it easier to implement content decryption.  A folio
     cannot be decrypted until all the requests that contribute to it have
     completed - and, again, most loads are sequential and so, most of the
     time, we want to begin decryption sequentially (though it's great if
     the decryption can happen in parallel).

There is a disadvantage in that we're losing the ability to decrypt and
unlock things on an as-things-arrive basis which may affect some
applications.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-28-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:08 +01:00
David Howells 9750be93b2
afs: Fix cleanup of immediately failed async calls
If we manage to begin an async call, but fail to transmit any data on it
due to a signal, we then abort it which causes a race between the
notification of call completion from rxrpc and our attempt to cancel the
notification.  The notification will be necessary, however, for async
FetchData to terminate the netfs subrequest.

However, since we get a notification from rxrpc upon completion of a call
(aborted or otherwise), we can just leave it to that.

This leads to calls not getting cleaned up, but appearing in
/proc/net/rxrpc/calls as being aborted with code 6.

Fix this by making the "error_do_abort:" case of afs_make_call() abort the
call and then abandon it to the notification handler.

Fixes: 34fa47612b ("afs: Fix race in async call refcounting")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-25-dhowells@redhat.com
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:08 +01:00
David Howells eae9e78951
afs: Use netfslib for symlinks, allowing them to be cached
Use netfslib to read symlinks, thereby allowing them to be cached by
fscache and cachefiles.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-23-dhowells@redhat.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:07 +01:00
David Howells 6dd8093661
afs: Use netfslib for directories
In the AFS ecosystem, directories are just a special type of file that is
downloaded and parsed locally.  Download is done by the same mechanism as
ordinary files and the data can be cached.  There is one important semantic
restriction on directories over files: the client must download the entire
directory in one go because, for example, the server could fabricate the
contents of the blob on the fly with each download and give a different
image each time.

So that we can cache the directory download, switch AFS directory support
over to using the netfslib single-object API, thereby allowing directory
content to be stored in the local cache.

To make this work, the following changes are made:

 (1) A directory's contents are now stored in a folio_queue chain attached
     to the afs_vnode (inode) struct rather than its associated pagecache,
     though multipage folios are still used to hold the data.  The folio
     queue is discarded when the directory inode is evicted.

     This also helps with the phasing out of ITER_XARRAY.

 (2) Various directory operations are made to use and unuse the cache
     cookie.

 (3) The content checking, content dumping and content iteration are now
     performed with a standard iov_iter iterator over the contents of the
     folio queue.

 (4) Iteration and modification must be done with the vnode's validate_lock
     held.  In conjunction with (1), this means that the iteration can be
     done without the need to lock pages or take extra refs on them, unlike
     when accessing ->i_pages.

 (5) Convert to using netfs_read_single() to read data.

 (6) Provide a ->writepages() to call netfs_writeback_single() to save the
     data to the cache according to the VM's scheduling whilst holding the
     validate_lock read-locked as (4).

 (7) Change local directory image editing functions:

     (a) Provide a function to get a specific block by number from the
     	 folio_queue as we can no longer use the i_pages xarray to locate
     	 folios by index.  This uses a cursor to remember the current
     	 position as we need to iterate through the directory contents.
     	 The block is kmapped before being returned.

     (b) Make the function in (a) extend the directory by an extra folio if
     	 we run out of space.

     (c) Raise the check of the block free space counter, for those blocks
     	 that have one, higher in the function to eliminate a call to get a
     	 block.

     (d) Remove the page unlocking and putting done during the editing
     	 loops.  This is no longer necessary as the folio_queue holds the
     	 references and the pages are no longer in the pagecache.

     (e) Mark the inode dirty and pin the cache usage till writeback at the
     	 end of a successful edit.

 (8) Don't set the large_folios flag on the inode as we do the allocation
     ourselves rather than the VM doing it automatically.

 (9) Mark the inode as being a single object that isn't uploaded to the
     server.

(10) Enable caching on directories.

(11) Only set the upload key for writeback for regular files.

Notes:

 (*) We keep the ->release_folio(), ->invalidate_folio() and
     ->migrate_folio() ops as we set the mapping pointer on the folio.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-22-dhowells@redhat.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:07 +01:00
David Howells 49866ce7ea
netfs: Add support for caching single monolithic objects such as AFS dirs
Add support for caching the content of a file that contains a single
monolithic object that must be read/written with a single I/O operation,
such as an AFS directory.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-20-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: netfs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:06 +01:00
David Howells e61bfaad8f
netfs: Add functions to build/clean a buffer in a folio_queue
Add two netfslib functions to build up or clean up a buffer in a
folio_queue.  The first, netfs_alloc_folioq_buffer() will add folios to a
buffer, extending up at least to the given size.  If it can, it will add
multipage folios.  The folios are optionally have the mapping set and will
have the index set according to the distance from the front of the folio
queue.

The second function will free up a folio queue and put any folios in the
queue that have the first mark set.

The netfs_folio tracepoint is also altered to cope with folios that have a
NULL mapping, and the folios being added/put will have trace lines emitted
and will be accounted in the stats.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-19-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: netfs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:06 +01:00
David Howells 9e705016eb
afs: Add more tracepoints to do with tracking validity
Add wrappers to set and clear the callback promise and to mark a directory
as invalidated, and add tracepoints to track these events:

 (1) afs_cb_promise: Log when a callback promise is set on a vnode.

 (2) afs_vnode_invalid: Log when the server's callback promise for a vnode
     is no longer valid and we need to refetch the vnode metadata.

 (3) afs_dir_invalid: Log when the contents of a directory are marked
     invalid and requiring refetching from the server and the cache
     invalidating.

and two tracepoints to record data version number management:

 (4) afs_set_dv: Log when the DV is recorded on a vnode.

 (5) afs_dv_mismatch: Log when the DV recorded on a vnode plus the expected
     delta for the operation does not match the DV we got back from the
     server.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-18-dhowells@redhat.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:06 +01:00
David Howells 229105e5cf
cachefiles: Add auxiliary data trace
Add a display of the first 8 bytes of the downloaded auxiliary data and of
the on-disk stored auxiliary data as these are used in coherency
management.  In the case of afs, this holds the data version number.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-17-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:05 +01:00
David Howells bcb33f79e1
cachefiles: Add some subrequest tracepoints
Add some tracepoints into the cachefiles write paths.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-16-dhowells@redhat.com
cc: netfs@lists.linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:05 +01:00
David Howells 06fa229ceb
netfs: Abstract out a rolling folio buffer implementation
A rolling buffer is a series of folios held in a list of folio_queues.  New
folios and folio_queue structs may be inserted at the head simultaneously
with spent ones being removed from the tail without the need for locking.

The rolling buffer includes an iov_iter and it has to be careful managing
this as the list of folio_queues is extended such that an oops doesn't
incurred because the iterator was pointing to the end of a folio_queue
segment that got appended to and then removed.

We need to use the mechanism twice, once for read and once for write, and,
in future patches, we will use a second rolling buffer to handle bounce
buffering for content encryption.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-6-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:02 +01:00
David Howells aabcabf274
netfs: Add a tracepoint to log the lifespan of folio_queue structs
Add a tracepoint to log the lifespan of folio_queue structs.  For tracing
illustrative purposes, folio_queues are tagged with the debug ID of
whatever they're related to (typically a netfs_io_request) and a debug ID
of their own.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-5-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:02 +01:00
David Howells 2a8a384621
cachefiles: Clean up some whitespace in trace header
Clean up some whitespace in the cachefiles trace header.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-3-dhowells@redhat.com
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:01 +01:00
David Howells d3d3ec8656
netfs: Clean up some whitespace in trace header
Clean up some whitespace in the netfs trace header.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-2-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20 22:34:01 +01:00
Dave Airlie c7d6cb4c43 drm-misc-next for 6.14:
UAPI Changes:
 
 Cross-subsystem Changes:
 
 Core Changes:
   - Remove driver date from drm_driver
 
 Driver Changes:
   - amdxdna: New driver!
   - ivpu: Fix qemu crash when using passthrough
   - nouveau: expose GSP-RM logging buffers via debugfs
   - panfrost: Add MT8188 Mali-G57 MC3 support
   - panthor: misc improvements,
   - rockchip: Gamma LUT support
   - tidss: Misc improvements
   - virtio: convert to helpers, add prime support for scanout buffers
   - v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
   - vc4: Add support for BCM2712
   - vkms: Improvements all across the board
 
   - panels:
     - Introduce backlight quirks infrastructure
     - New panels: KDB KD116N2130B12
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZ1G6igAKCRAnX84Zoj2+
 dpx8AX4m4lM6bo7/I/SDqR6Dw6zDX2AgbupW9NzFoJmlC+X/XOLgKEoCwam+j+09
 hZKYTwcBfRwVa1UDccjHNdWA0IUxUYFQUeiVk59xlBhZZs5vFKorX7r7eMQNl3S1
 gcnSrwy6OQ==
 =/dK/
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2024-12-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

[airlied: handle module ns conflict]

drm-misc-next for 6.14:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:
  - Remove driver date from drm_driver

Driver Changes:
  - amdxdna: New driver!
  - ivpu: Fix qemu crash when using passthrough
  - nouveau: expose GSP-RM logging buffers via debugfs
  - panfrost: Add MT8188 Mali-G57 MC3 support
  - panthor: misc improvements,
  - rockchip: Gamma LUT support
  - tidss: Misc improvements
  - virtio: convert to helpers, add prime support for scanout buffers
  - v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
  - vc4: Add support for BCM2712
  - vkms: Improvements all across the board

  - panels:
    - Introduce backlight quirks infrastructure
    - New panels: KDB KD116N2130B12

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241205-agile-straight-pegasus-aca7f4@houat
2024-12-13 08:48:09 +10:00
Akinobu Mita 6535b8669c mm/damon: fix order of arguments in damos_before_apply tracepoint
Since the order of the scheme_idx and target_idx arguments in TP_ARGS is
reversed, they are stored in the trace record in reverse.

Link: https://lkml.kernel.org/r/20241115182023.43118-1-sj@kernel.org
Link: https://patch.msgid.link/20241112154828.40307-1-akinobu.mita@gmail.com
Fixes: c603c630b5 ("mm/damon/core: add a tracepoint for damos apply target regions")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05 19:54:47 -08:00
Maxime Ripard 3aba2eba84
Merge drm/drm-next into drm-misc-next
Kickstart 6.14 cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-12-02 12:44:18 +01:00
Linus Torvalds baf67f6aa9 NFS client updates for Linux 6.13
Highlights include:
 
 Bugfixes:
 - NFSv4.0: Fix a use-after-free problem in open()
 - nfs/localio: fix for a memory corruption in nfs_local_read_done
 - Revert "nfs: don't reuse partially completed requests in nfs_lock_and_join_requests"
 - nfsv4: ignore SB_RDONLY when mounting nfs
 - sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reseting the transport
 - SUNRPC: timeout and cancel TLS handshake with -ETIMEDOUT
 - sunrpc: fix one UAF issue caused by sunrpc kernel tcp socket
 - pNFS/blocklayout: Fix device registration issues
 - SUNRPC: Fix a hang in TLS sock_close if sk_write_pending
 
 Features and cleanups:
 - localio cleanups from Mike Snitzer
 - Clean up refcounting on the nfs version modules
 - __counted_by() annotations
 - nfs: make processes that are waiting for an I/O lock killable
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmdIrr0ACgkQZwvnipYK
 APKQ3w//ZRqyvhwD1MrK8vyQmDbSPNaMMVx710Hz7GYR5+ij+dGf+FNOr9sLqw8h
 NkVrOhX7V1JRM/lz5mq3zPYCip5ZHKJQZAzLqOUqcBq7RtCG3G31h53so8S+GIap
 j1hXsc2cmADIVm3ztm+HAn5kiT4lcBoeiEmsu/+dL0i5MVhYiEmCIBj3tdnhRtrL
 Gql8nN6zyOCPtOBgiOViNje5w+arcJXN/yFHCWQPU7yPDb/dYDnHSB3ScJsuyxZQ
 CjFn/AAdOfe8cHXGOmHryiQ0KlplwC6oxn1DoOG67FENk4ujFgLpYqnF0yPY5XxG
 bmWuJVV9sFPwQ+n9RBybAK21lvpOMoGN0O+n5fBnALS25FrYEgJBWphqbXwvWdH1
 23PZlTeiBqbjZv80PfCBAXByAmzWffp7wPQVd94Ny3Jr774IXcnAFWeMHgnRhDTj
 5bY3wOxRzmVChLkyxIM9kYM1Wafb2vnXkL/EL8Kav3RpAdAGNbCH6kWOfJIpSR0j
 Is9znfXGNwav6x3kahL7BGKO9WG52YfWCia+vxOcTWYjtgplLPdXMVZZjB6VlWRe
 HzzmXTzRNQ/eMHNqESB04Pyn9pttYQAkVLy2R0ynEV1SQyhSM9E57/QLSOEIyTU8
 u+rsIkCGz9KdHwltKOKxNJ/Jy5khpyPOQC5zrcp7vtctPnAsGek=
 =Ih5w
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Bugfixes:
   - nfs/localio: fix for a memory corruption in nfs_local_read_done
   - Revert "nfs: don't reuse partially completed requests in
     nfs_lock_and_join_requests"
   - nfsv4:
       - ignore SB_RDONLY when mounting nfs
       - Fix a use-after-free problem in open()
   - sunrpc:
       - clear XPRT_SOCK_UPD_TIMEOUT when reseting the transport
       - timeout and cancel TLS handshake with -ETIMEDOUT
       - fix one UAF issue caused by sunrpc kernel tcp socket
       - Fix a hang in TLS sock_close if sk_write_pending
   - pNFS/blocklayout: Fix device registration issues

  Features and cleanups:
   - localio cleanups from Mike Snitzer
   - Clean up refcounting on the nfs version modules
   - __counted_by() annotations
   - nfs: make processes that are waiting for an I/O lock killable"

* tag 'nfs-for-6.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
  fs/nfs/io: make nfs_start_io_*() killable
  nfs/blocklayout: Limit repeat device registration on failure
  nfs/blocklayout: Don't attempt unregister for invalid block device
  sunrpc: fix one UAF issue caused by sunrpc kernel tcp socket
  SUNRPC: timeout and cancel TLS handshake with -ETIMEDOUT
  sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport
  nfs: ignore SB_RDONLY when mounting nfs
  Revert "nfs: don't reuse partially completed requests in nfs_lock_and_join_requests"
  Revert "fs: nfs: fix missing refcnt by replacing folio_set_private by folio_attach_private"
  nfs/localio: must clear res.replen in nfs_local_read_done
  NFSv4.0: Fix a use-after-free problem in the asynchronous open()
  NFSv4.0: Fix the wake up of the next waiter in nfs_release_seqid()
  SUNRPC: Fix a hang in TLS sock_close if sk_write_pending
  sunrpc: remove newlines from tracepoints
  nfs: Annotate struct pnfs_commit_array with __counted_by()
  nfs/localio: eliminate need for nfs_local_fsync_work forward declaration
  nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter
  nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx
  nfs/localio: remove redundant suid/sgid handling
  NFS: Implement get_nfs_version()
  ...
2024-11-30 10:17:53 -08:00
Linus Torvalds 7f4f3b14e8 Add Rust support for trace events:
- Allow Rust code to have trace events
 
   Trace events is a popular way to debug what is happening inside the kernel
   or just to find out what is happening. Rust code is being added to the
   Linux kernel but it currently does not support the tracing infrastructure.
   Add support of trace events inside Rust code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ0DjqhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrLlAPsF6t/c1nHSGTKDv9FJDJe4JHdP7e+U
 7X0S8BmSTKFNAQD+K2TEd0bjVP7ug8dQZBT+fveiFr+ARYxAwJ3JnEFjUwg=
 =Ab+T
 -----END PGP SIGNATURE-----

Merge tag 'trace-rust-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull rust trace event support from Steven Rostedt:
 "Allow Rust code to have trace events

  Trace events is a popular way to debug what is happening inside the
  kernel or just to find out what is happening. Rust code is being added
  to the Linux kernel but it currently does not support the tracing
  infrastructure. Add support of trace events inside Rust code"

* tag 'trace-rust-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rust: jump_label: skip formatting generated file
  jump_label: rust: pass a mut ptr to `static_key_count`
  samples: rust: fix `rust_print` build making it a combined module
  rust: add arch_static_branch
  jump_label: adjust inline asm to be consistent
  rust: samples: add tracepoint to Rust sample
  rust: add tracepoint support
  rust: add static_branch_unlikely for static_key_false
2024-11-25 15:44:29 -08:00
Linus Torvalds 9f16d5e6f2 The biggest change here is eliminating the awful idea that KVM had, of
essentially guessing which pfns are refcounted pages.  The reason to
 do so was that KVM needs to map both non-refcounted pages (for example
 BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain
 refcounted pages.  However, the result was security issues in the past,
 and more recently the inability to map VM_IO and VM_PFNMAP memory
 that _is_ backed by struct page but is not refcounted.  In particular
 this broke virtio-gpu blob resources (which directly map host graphics
 buffers into the guest as "vram" for the virtio-gpu device) with the
 amdgpu driver, because amdgpu allocates non-compound higher order pages
 and the tail pages could not be mapped into KVM.
 
 This requires adjusting all uses of struct page in the per-architecture
 code, to always work on the pfn whenever possible.  The large series that
 did this, from David Stevens and Sean Christopherson, also cleaned up
 substantially the set of functions that provided arch code with the
 pfn for a host virtual addresses.  The previous maze of twisty little
 passages, all different, is replaced by five functions (__gfn_to_page,
 __kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages)
 saving almost 200 lines of code.
 
 ARM:
 
 * Support for stage-1 permission indirection (FEAT_S1PIE) and
   permission overlays (FEAT_S1POE), including nested virt + the
   emulated page table walker
 
 * Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call
   was introduced in PSCIv1.3 as a mechanism to request hibernation,
   similar to the S4 state in ACPI
 
 * Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
   part of it, introduce trivial initialization of the host's MPAM
   context so KVM can use the corresponding traps
 
 * PMU support under nested virtualization, honoring the guest
   hypervisor's trap configuration and event filtering when running a
   nested guest
 
 * Fixes to vgic ITS serialization where stale device/interrupt table
   entries are not zeroed when the mapping is invalidated by the VM
 
 * Avoid emulated MMIO completion if userspace has requested synchronous
   external abort injection
 
 * Various fixes and cleanups affecting pKVM, vCPU initialization, and
   selftests
 
 LoongArch:
 
 * Add iocsr and mmio bus simulation in kernel.
 
 * Add in-kernel interrupt controller emulation.
 
 * Add support for virtualization extensions to the eiointc irqchip.
 
 PPC:
 
 * Drop lingering and utterly obsolete references to PPC970 KVM, which was
   removed 10 years ago.
 
 * Fix incorrect documentation references to non-existing ioctls
 
 RISC-V:
 
 * Accelerate KVM RISC-V when running as a guest
 
 * Perf support to collect KVM guest statistics from host side
 
 s390:
 
 * New selftests: more ucontrol selftests and CPU model sanity checks
 
 * Support for the gen17 CPU model
 
 * List registers supported by KVM_GET/SET_ONE_REG in the documentation
 
 x86:
 
 * Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve
   documentation, harden against unexpected changes.  Even if the hardware
   A/D tracking is disabled, it is possible to use the hardware-defined A/D
   bits to track if a PFN is Accessed and/or Dirty, and that removes a lot
   of special cases.
 
 * Elide TLB flushes when aging secondary PTEs, as has been done in x86's
   primary MMU for over 10 years.
 
 * Recover huge pages in-place in the TDP MMU when dirty page logging is
   toggled off, instead of zapping them and waiting until the page is
   re-accessed to create a huge mapping.  This reduces vCPU jitter.
 
 * Batch TLB flushes when dirty page logging is toggled off.  This reduces
   the time it takes to disable dirty logging by ~3x.
 
 * Remove the shrinker that was (poorly) attempting to reclaim shadow page
   tables in low-memory situations.
 
 * Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE.
 
 * Advertise CPUIDs for new instructions in Clearwater Forest
 
 * Quirk KVM's misguided behavior of initialized certain feature MSRs to
   their maximum supported feature set, which can result in KVM creating
   invalid vCPU state.  E.g. initializing PERF_CAPABILITIES to a non-zero
   value results in the vCPU having invalid state if userspace hides PDCM
   from the guest, which in turn can lead to save/restore failures.
 
 * Fix KVM's handling of non-canonical checks for vCPUs that support LA57
   to better follow the "architecture", in quotes because the actual
   behavior is poorly documented.  E.g. most MSR writes and descriptor
   table loads ignore CR4.LA57 and operate purely on whether the CPU
   supports LA57.
 
 * Bypass the register cache when querying CPL from kvm_sched_out(), as
   filling the cache from IRQ context is generally unsafe; harden the
   cache accessors to try to prevent similar issues from occuring in the
   future.  The issue that triggered this change was already fixed in 6.12,
   but was still kinda latent.
 
 * Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM
   over-advertises SPEC_CTRL when trying to support cross-vendor VMs.
 
 * Minor cleanups
 
 * Switch hugepage recovery thread to use vhost_task.  These kthreads can
   consume significant amounts of CPU time on behalf of a VM or in response
   to how the VM behaves (for example how it accesses its memory); therefore
   KVM tried to place the thread in the VM's cgroups and charge the CPU
   time consumed by that work to the VM's container.  However the kthreads
   did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM
   instances inside could not complete freezing.  Fix this by replacing the
   kthread with a PF_USER_WORKER thread, via the vhost_task abstraction.
   Another 100+ lines removed, with generally better behavior too like
   having these threads properly parented in the process tree.
 
 * Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't
   really work; there was really nothing to work around anyway: the broken
   patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL
   MSR is virtualized and therefore unaffected by the erratum.
 
 * Fix 6.12 regression where CONFIG_KVM will be built as a module even
   if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'.
 
 x86 selftests:
 
 * x86 selftests can now use AVX.
 
 Documentation:
 
 * Use rST internal links
 
 * Reorganize the introduction to the API document
 
 Generic:
 
 * Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead
   of RCU, so that running a vCPU on a different task doesn't encounter long
   due to having to wait for all CPUs become quiescent.  In general both reads
   and writes are rare, but userspace that supports confidential computing is
   introducing the use of "helper" vCPUs that may jump from one host processor
   to another.  Those will be very happy to trigger a synchronize_rcu(), and
   the effect on performance is quite the disaster.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmc9MRYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP00QgArxqxBIGLCW5t7bw7vtNq63QYRyh4
 dTiDguLiYQJ+AXmnRu11R6aPC7HgMAvlFCCmH+GEce4WEgt26hxCmncJr/aJOSwS
 letCS7TrME16PeZvh25A1nhPBUw6mTF1qqzgcdHMrqXG8LuHoGcKYGSRVbkf3kfI
 1ZoMq1r8ChXbVVmCx9DQ3gw1TVr5Dpjs2voLh8rDSE9Xpw0tVVabHu3/NhQEz/F+
 t8/nRaqH777icCHIf9PCk5HnarHxLAOvhM2M0Yj09PuBcE5fFQxpxltw/qiKQqqW
 ep4oquojGl87kZnhlDaac2UNtK90Ws+WxxvCwUmbvGN0ZJVaQwf4FvTwig==
 =lWpE
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "The biggest change here is eliminating the awful idea that KVM had of
  essentially guessing which pfns are refcounted pages.

  The reason to do so was that KVM needs to map both non-refcounted
  pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP
  VMAs that contain refcounted pages.

  However, the result was security issues in the past, and more recently
  the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by
  struct page but is not refcounted. In particular this broke virtio-gpu
  blob resources (which directly map host graphics buffers into the
  guest as "vram" for the virtio-gpu device) with the amdgpu driver,
  because amdgpu allocates non-compound higher order pages and the tail
  pages could not be mapped into KVM.

  This requires adjusting all uses of struct page in the
  per-architecture code, to always work on the pfn whenever possible.
  The large series that did this, from David Stevens and Sean
  Christopherson, also cleaned up substantially the set of functions
  that provided arch code with the pfn for a host virtual addresses.

  The previous maze of twisty little passages, all different, is
  replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the
  non-__ versions of these two, and kvm_prefetch_pages) saving almost
  200 lines of code.

  ARM:

   - Support for stage-1 permission indirection (FEAT_S1PIE) and
     permission overlays (FEAT_S1POE), including nested virt + the
     emulated page table walker

   - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This
     call was introduced in PSCIv1.3 as a mechanism to request
     hibernation, similar to the S4 state in ACPI

   - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
     part of it, introduce trivial initialization of the host's MPAM
     context so KVM can use the corresponding traps

   - PMU support under nested virtualization, honoring the guest
     hypervisor's trap configuration and event filtering when running a
     nested guest

   - Fixes to vgic ITS serialization where stale device/interrupt table
     entries are not zeroed when the mapping is invalidated by the VM

   - Avoid emulated MMIO completion if userspace has requested
     synchronous external abort injection

   - Various fixes and cleanups affecting pKVM, vCPU initialization, and
     selftests

  LoongArch:

   - Add iocsr and mmio bus simulation in kernel.

   - Add in-kernel interrupt controller emulation.

   - Add support for virtualization extensions to the eiointc irqchip.

  PPC:

   - Drop lingering and utterly obsolete references to PPC970 KVM, which
     was removed 10 years ago.

   - Fix incorrect documentation references to non-existing ioctls

  RISC-V:

   - Accelerate KVM RISC-V when running as a guest

   - Perf support to collect KVM guest statistics from host side

  s390:

   - New selftests: more ucontrol selftests and CPU model sanity checks

   - Support for the gen17 CPU model

   - List registers supported by KVM_GET/SET_ONE_REG in the
     documentation

  x86:

   - Cleanup KVM's handling of Accessed and Dirty bits to dedup code,
     improve documentation, harden against unexpected changes.

     Even if the hardware A/D tracking is disabled, it is possible to
     use the hardware-defined A/D bits to track if a PFN is Accessed
     and/or Dirty, and that removes a lot of special cases.

   - Elide TLB flushes when aging secondary PTEs, as has been done in
     x86's primary MMU for over 10 years.

   - Recover huge pages in-place in the TDP MMU when dirty page logging
     is toggled off, instead of zapping them and waiting until the page
     is re-accessed to create a huge mapping. This reduces vCPU jitter.

   - Batch TLB flushes when dirty page logging is toggled off. This
     reduces the time it takes to disable dirty logging by ~3x.

   - Remove the shrinker that was (poorly) attempting to reclaim shadow
     page tables in low-memory situations.

   - Clean up and optimize KVM's handling of writes to
     MSR_IA32_APICBASE.

   - Advertise CPUIDs for new instructions in Clearwater Forest

   - Quirk KVM's misguided behavior of initialized certain feature MSRs
     to their maximum supported feature set, which can result in KVM
     creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to
     a non-zero value results in the vCPU having invalid state if
     userspace hides PDCM from the guest, which in turn can lead to
     save/restore failures.

   - Fix KVM's handling of non-canonical checks for vCPUs that support
     LA57 to better follow the "architecture", in quotes because the
     actual behavior is poorly documented. E.g. most MSR writes and
     descriptor table loads ignore CR4.LA57 and operate purely on
     whether the CPU supports LA57.

   - Bypass the register cache when querying CPL from kvm_sched_out(),
     as filling the cache from IRQ context is generally unsafe; harden
     the cache accessors to try to prevent similar issues from occuring
     in the future. The issue that triggered this change was already
     fixed in 6.12, but was still kinda latent.

   - Advertise AMD_IBPB_RET to userspace, and fix a related bug where
     KVM over-advertises SPEC_CTRL when trying to support cross-vendor
     VMs.

   - Minor cleanups

   - Switch hugepage recovery thread to use vhost_task.

     These kthreads can consume significant amounts of CPU time on
     behalf of a VM or in response to how the VM behaves (for example
     how it accesses its memory); therefore KVM tried to place the
     thread in the VM's cgroups and charge the CPU time consumed by that
     work to the VM's container.

     However the kthreads did not process SIGSTOP/SIGCONT, and therefore
     cgroups which had KVM instances inside could not complete freezing.

     Fix this by replacing the kthread with a PF_USER_WORKER thread, via
     the vhost_task abstraction. Another 100+ lines removed, with
     generally better behavior too like having these threads properly
     parented in the process tree.

   - Revert a workaround for an old CPU erratum (Nehalem/Westmere) that
     didn't really work; there was really nothing to work around anyway:
     the broken patch was meant to fix nested virtualization, but the
     PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the
     erratum.

   - Fix 6.12 regression where CONFIG_KVM will be built as a module even
     if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is
     'y'.

  x86 selftests:

   - x86 selftests can now use AVX.

  Documentation:

   - Use rST internal links

   - Reorganize the introduction to the API document

  Generic:

   - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock
     instead of RCU, so that running a vCPU on a different task doesn't
     encounter long due to having to wait for all CPUs become quiescent.

     In general both reads and writes are rare, but userspace that
     supports confidential computing is introducing the use of "helper"
     vCPUs that may jump from one host processor to another. Those will
     be very happy to trigger a synchronize_rcu(), and the effect on
     performance is quite the disaster"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits)
  KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD
  KVM: x86: add back X86_LOCAL_APIC dependency
  Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()"
  KVM: x86: switch hugepage recovery thread to vhost_task
  KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR
  x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest
  Documentation: KVM: fix malformed table
  irqchip/loongson-eiointc: Add virt extension support
  LoongArch: KVM: Add irqfd support
  LoongArch: KVM: Add PCHPIC user mode read and write functions
  LoongArch: KVM: Add PCHPIC read and write functions
  LoongArch: KVM: Add PCHPIC device support
  LoongArch: KVM: Add EIOINTC user mode read and write functions
  LoongArch: KVM: Add EIOINTC read and write functions
  LoongArch: KVM: Add EIOINTC device support
  LoongArch: KVM: Add IPI user mode read and write function
  LoongArch: KVM: Add IPI read and write function
  LoongArch: KVM: Add IPI device support
  LoongArch: KVM: Add iocsr and mmio bus simulation in kernel
  KVM: arm64: Pass on SVE mapping failures
  ...
2024-11-23 16:00:50 -08:00
Linus Torvalds 5c00ff742b - The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection algorithm.
   This leads to improved memory savings.
 
 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
 
 	- "refine mas_mab_cp()"
 	- "Reduce the space to be cleared for maple_big_node"
 	- "maple_tree: simplify mas_push_node()"
 	- "Following cleanup after introduce mas_wr_store_type()"
 	- "refine storing null"
 
 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.
 
 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping code.
 
 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of shadow
   entries.
 
 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.
 
 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in the
   hugetlb code.
 
 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page into
   small pages.  Instead we replace the whole thing with a THP.  More
   consistent cleaner and potentiall saves a large number of pagefaults.
 
 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.
 
 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to do.
 
 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio size
   rather than as individual pages.  A 20% speedup was observed.
 
 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting.
 
 - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt
   removes the long-deprecated memcgv2 charge moving feature.
 
 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.
 
 - The series "x86/module: use large ROX pages for text allocations" from
   Mike Rapoport teaches x86 to use large pages for read-only-execute
   module text.
 
 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.
 
 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/.  A slow march towards shrinking
   struct page.
 
 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.
 
 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression.  It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.
 
 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests
   over to the KUnit framework.
 
 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a single
   VMA, rather than requiring that multiple VMAs be created for this.
   Improved efficiencies for userspace memory allocators are expected.
 
 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.
 
 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.
 
 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP from
   the kernel boot command line.
 
 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.
 
 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep is
   enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA
 jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq
 xyyenpibWgUoShU2wZ/Ha8FE5WDINwg=
 =JfWR
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "zram: optimal post-processing target selection" from
   Sergey Senozhatsky improves zram's post-processing selection
   algorithm. This leads to improved memory savings.

 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
	- "refine mas_mab_cp()"
	- "Reduce the space to be cleared for maple_big_node"
	- "maple_tree: simplify mas_push_node()"
	- "Following cleanup after introduce mas_wr_store_type()"
	- "refine storing null"

 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.

 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping
   code.

 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of
   shadow entries.

 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.

 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in
   the hugetlb code.

 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page
   into small pages. Instead we replace the whole thing with a THP. More
   consistent cleaner and potentiall saves a large number of pagefaults.

 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.

 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to
   do.

 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio
   size rather than as individual pages. A 20% speedup was observed.

 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON
   splitting.

 - The series "memcg-v1: fully deprecate charge moving" from Shakeel
   Butt removes the long-deprecated memcgv2 charge moving feature.

 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.

 - The series "x86/module: use large ROX pages for text allocations"
   from Mike Rapoport teaches x86 to use large pages for
   read-only-execute module text.

 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.

 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/. A slow march towards shrinking
   struct page.

 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.

 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression. It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.

 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in
   tests over to the KUnit framework.

 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a
   single VMA, rather than requiring that multiple VMAs be created for
   this. Improved efficiencies for userspace memory allocators are
   expected.

 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.

 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.

 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP
   from the kernel boot command line.

 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.

 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep
   is enabled.

* tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits)
  cma: enforce non-zero pageblock_order during cma_init_reserved_mem()
  mm/kfence: add a new kunit test test_use_after_free_read_nofault()
  zram: fix NULL pointer in comp_algorithm_show()
  memcg/hugetlb: add hugeTLB counters to memcg
  vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event
  mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
  zram: ZRAM_DEF_COMP should depend on ZRAM
  MAINTAINERS/MEMORY MANAGEMENT: add document files for mm
  Docs/mm/damon: recommend academic papers to read and/or cite
  mm: define general function pXd_init()
  kmemleak: iommu/iova: fix transient kmemleak false positive
  mm/list_lru: simplify the list_lru walk callback function
  mm/list_lru: split the lock to per-cgroup scope
  mm/list_lru: simplify reparenting and initial allocation
  mm/list_lru: code clean up for reparenting
  mm/list_lru: don't export list_lru_add
  mm/list_lru: don't pass unnecessary key parameters
  kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller
  kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW
  kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols
  ...
2024-11-23 09:58:07 -08:00
Linus Torvalds 06afb0f361 tracing updates for v6.13:
- Addition of faultable tracepoints
 
   There's a tracepoint attached to both a system call entry and exit. This
   location is known to allow page faults. The tracepoints are called under
   an rcu_read_lock() which does not allow faults that can sleep. This limits
   the ability of tracepoint handlers to page fault in user space system call
   parameters. Now these tracepoints have been made "faultable", allowing the
   callbacks to fault in user space parameters and record them.
 
   Note, only the infrastructure has been implemented. The consumers (perf,
   ftrace, BPF) now need to have their code modified to allow faults.
 
 - Fix up of BPF code for the tracepoint faultable logic
 
 - Update tracepoints to use the new static branch API
 
 - Remove trace_*_rcuidle() variants and the SRCU protection they used
 
 - Remove unused TRACE_EVENT_FL_FILTERED logic
 
 - Replace strncpy() with strscpy() and memcpy()
 
 - Use replace per_cpu_ptr(smp_processor_id()) with this_cpu_ptr()
 
 - Fix perf events to not duplicate samples when tracing is enabled
 
 - Replace atomic64_add_return(1, counter) with atomic64_inc_return(counter)
 
 - Make stack trace buffer 4K instead of PAGE_SIZE
 
 - Remove TRACE_FLAG_IRQS_NOSUPPORT flag as it was never used
 
 - Get the true return address for function tracer when function graph tracer
   is also running.
 
   When function_graph trace is running along with function tracer,
   the parent function of the function tracer sometimes is
   "return_to_handler", which is the function graph trampoline to record
   the exit of the function. Use existing logic that calls into the
   fgraph infrastructure to find the real return address.
 
 - Remove (un)regfunc pointers out of tracepoint structure
 
 - Added last minute bug fix for setting pending modules in stack function
   filter.
 
   echo "write*:mod:ext3" > /sys/kernel/tracing/stack_trace_filter
 
   Would cause a kernel NULL dereference.
 
 - Minor clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZz6dehQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlQsAP9aB0XGUV3UykvjZuKK84VDZ26a2hZH
 X2JDYsNA4luuPAEAz/BG2rnslfMZ04WTMAl8h1eh10lxcuHG0wQMHVBXIwI=
 =lzb5
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Addition of faultable tracepoints

   There's a tracepoint attached to both a system call entry and exit.
   This location is known to allow page faults. The tracepoints are
   called under an rcu_read_lock() which does not allow faults that can
   sleep. This limits the ability of tracepoint handlers to page fault
   in user space system call parameters. Now these tracepoints have been
   made "faultable", allowing the callbacks to fault in user space
   parameters and record them.

   Note, only the infrastructure has been implemented. The consumers
   (perf, ftrace, BPF) now need to have their code modified to allow
   faults.

 - Fix up of BPF code for the tracepoint faultable logic

 - Update tracepoints to use the new static branch API

 - Remove trace_*_rcuidle() variants and the SRCU protection they used

 - Remove unused TRACE_EVENT_FL_FILTERED logic

 - Replace strncpy() with strscpy() and memcpy()

 - Use replace per_cpu_ptr(smp_processor_id()) with this_cpu_ptr()

 - Fix perf events to not duplicate samples when tracing is enabled

 - Replace atomic64_add_return(1, counter) with
   atomic64_inc_return(counter)

 - Make stack trace buffer 4K instead of PAGE_SIZE

 - Remove TRACE_FLAG_IRQS_NOSUPPORT flag as it was never used

 - Get the true return address for function tracer when function graph
   tracer is also running.

   When function_graph trace is running along with function tracer, the
   parent function of the function tracer sometimes is
   "return_to_handler", which is the function graph trampoline to record
   the exit of the function. Use existing logic that calls into the
   fgraph infrastructure to find the real return address.

 - Remove (un)regfunc pointers out of tracepoint structure

 - Added last minute bug fix for setting pending modules in stack
   function filter.

     echo "write*:mod:ext3" > /sys/kernel/tracing/stack_trace_filter

   Would cause a kernel NULL dereference.

 - Minor clean ups

* tag 'trace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits)
  ftrace: Fix regression with module command in stack_trace_filter
  tracing: Fix function name for trampoline
  ftrace: Get the true parent ip for function tracer
  tracing: Remove redundant check on field->field in histograms
  bpf: ensure RCU Tasks Trace GP for sleepable raw tracepoint BPF links
  bpf: decouple BPF link/attach hook and BPF program sleepable semantics
  bpf: put bpf_link's program when link is safe to be deallocated
  tracing: Replace strncpy() with strscpy() when copying comm
  tracing: Add might_fault() check in __DECLARE_TRACE_SYSCALL
  tracing: Fix syscall tracepoint use-after-free
  tracing: Introduce tracepoint_is_faultable()
  tracing: Introduce tracepoint extended structure
  tracing: Remove TRACE_FLAG_IRQS_NOSUPPORT
  tracing: Replace multiple deprecated strncpy with memcpy
  tracing: Make percpu stack trace buffer invariant to PAGE_SIZE
  tracing: Use atomic64_inc_return() in trace_clock_counter()
  trace/trace_event_perf: remove duplicate samples on the first tracepoint event
  tracing/bpf: Add might_fault check to syscall probes
  tracing/perf: Add might_fault check to syscall probes
  tracing/ftrace: Add might_fault check to syscall probes
  ...
2024-11-22 13:27:01 -08:00
Lizhi Hou aac243092b accel/amdxdna: Add command execution
Add interfaces for user application to submit command and wait for its
completion.

Co-developed-by: Min Ma <min.ma@amd.com>
Signed-off-by: Min Ma <min.ma@amd.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241118172942.2014541-8-lizhi.hou@amd.com
2024-11-22 11:43:27 -07:00
Lizhi Hou b87f920b93 accel/amdxdna: Support hardware mailbox
The hardware mailboxes are used by the driver to submit requests to
firmware and receive the completion notices from hardware.

Initially, a management mailbox channel is up and running. The driver may
request firmware to create/destroy more channels dynamically through
management channel.

Add driver internal mailbox interfaces.
  - create/destroy a mailbox channel instance
  - send a message to the firmware through a specific channel
  - wait for a notification from the specific channel

Co-developed-by: George Yang <George.Yang@amd.com>
Signed-off-by: George Yang <George.Yang@amd.com>
Co-developed-by: Min Ma <min.ma@amd.com>
Signed-off-by: Min Ma <min.ma@amd.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241118172942.2014541-4-lizhi.hou@amd.com
2024-11-22 11:41:54 -07:00
Linus Torvalds 51ae62a12c dma-mapping updates for Linux 6.13
- improve the DMA API tracing code (Sean Anderson)
  - misc cleanups (Christoph Hellwig, Sui Jingfeng)
  - fix pointer abuse when finding the shared DMA pool (Geert Uytterhoeven)
  - fix a deadlock in dma-debug (Levi Yun)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmc8xN8LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNwEBAAtd0zTiNuEUklY6YtZ7l/Zaudibmq1klHLGAQZEa9
 J4P2zzJ6xTkUblq/aVmFUQmf+vuuszjHIrrXnL3tAulSQKxS5Zj3Cci4cW4IAfBn
 GXB3OTR2lgXSk+8sulgiwc1AA8xgIFJJgZDTni1WdiW9LwLvUyYI1XNVAwCYOM2J
 HS2QxIySm3eg23F5bRz+Xl3LQlWYlHkMHryqKloHWIqchmVpYlYbj7uBMjAH4FKz
 l3zhd9pZSp9w5NNCp2Y/d81XdOUSjcYSR1gUotLzmW0Sj3YjnKXKdjjlPrj3zimb
 9EhgdalnpVrJ4Nr7MmpSUEbTVs+hBjXDoxTnnBRlKEl5aIKqceCrSBvoP70ygbkf
 KRqNS4ZxKe59cfnWAZQVcg8g01TetCoJR6QyGaoTE9Lz+9cPl2xAwyFmcYN2w/Cp
 qs0ZEFiNpqLAN5zwR/Pakz5YgIA/3N5MW0d9X9yEH9l4+HUMxWIF/qvThBSsGswT
 EmVUQqPpEzGJrcNYgC1UsEBltGmle02BwcoFEdMr7bzldW7yIpoDEOkKkBM3JFF9
 vgkpAkZGA5j4VMSkSwOrhi1rI0XAoImtJeM0wqhLtpXgQDjrMd3DaW6by6uUeH5x
 DcXf6qVOAsB04je9JkHh9I4BXVrWC01MSgFdjfQRl9gktn7970YFswG4ksYAwxU6
 xHQ=
 =ivZc
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.13-2024-11-19' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - improve the DMA API tracing code (Sean Anderson)

 - misc cleanups (Christoph Hellwig, Sui Jingfeng)

 - fix pointer abuse when finding the shared DMA pool (Geert
   Uytterhoeven)

 - fix a deadlock in dma-debug (Levi Yun)

* tag 'dma-mapping-6.13-2024-11-19' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: save base/size instead of pointer to shared DMA pool
  dma-mapping: fix swapped dir/flags arguments to trace_dma_alloc_sgt_err
  dma-mapping: drop unneeded includes from dma-mapping.h
  dma-mapping: trace more error paths
  dma-mapping: use trace_dma_alloc for dma_alloc* instead of using trace_dma_map
  dma-mapping: trace dma_alloc/free direction
  dma-mapping: use macros to define events in a class
  dma-mapping: remove an outdated comment from dma-map-ops.h
  dma-debug: remove DMA_API_DEBUG_SG
  dma-debug: store a phys_addr_t in struct dma_debug_entry
  dma-debug: fix a possible deadlock on radix_lock
2024-11-21 11:28:39 -08:00
Linus Torvalds fcc79e1714 Networking changes for 6.13.
The most significant set of changes is the per netns RTNL. The new
 behavior is disabled by default, regression risk should be contained.
 
 Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
 default value from PTP_1588_CLOCK_KVM, as the first is intended to be
 a more reliable replacement for the latter.
 
 Core
 ----
 
  - Started a very large, in-progress, effort to make the RTNL lock
    scope per network-namespace, thus reducing the lock contention
    significantly in the containerized use-case, comprising:
    - RCU-ified some relevant slices of the FIB control path
    - introduce basic per netns locking helpers
    - namespacified the IPv4 address hash table
    - remove rtnl_register{,_module}() in favour of rtnl_register_many()
    - refactor rtnl_{new,del,set}link() moving as much validation as
      possible out of RTNL lock
    - convert all phonet doit() and dumpit() handlers to RCU
    - convert IPv4 addresses manipulation to per-netns RTNL
    - convert virtual interface creation to per-netns RTNL
    the per-netns lock infra is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL
    knob, disabled by default ad interim.
 
  - Introduce NAPI suspension, to efficiently switching between busy
    polling (NAPI processing suspended) and normal processing.
 
  - Migrate the IPv4 routing input, output and control path from direct
    ToS usage to DSCP macros. This is a work in progress to make ECN
    handling consistent and reliable.
 
  - Add drop reasons support to the IPv4 rotue input path, allowing
    better introspection in case of packets drop.
 
  - Make FIB seqnum lockless, dropping RTNL protection for read
    access.
 
  - Make inet{,v6} addresses hashing less predicable.
 
  - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
    and timestamps
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Add small file operations for debugfs, to reduce the struct ops size.
 
  - Refactoring and optimization for the implementation of page_frag API,
    This is a preparatory work to consolidate the page_frag
    implementation.
 
 Netfilter
 ---------
 
  - Optimize set element transactions to reduce memory consumption
 
  - Extended netlink error reporting for attribute parser failure.
 
  - Make legacy xtables configs user selectable, giving users
    the option to configure iptables without enabling any other config.
 
  - Address a lot of false-positive RCU issues, pointed by recent
    CI improvements.
 
 BPF
 ---
 
  - Put xsk sockets on a struct diet and add various cleanups. Overall,
    this helps to bump performance by 12% for some workloads.
 
  - Extend BPF selftests to increase coverage of XDP features in
    combination with BPF cpumap.
 
  - Optimize and homogenize bpf_csum_diff helper for all archs and also
    add a batch of new BPF selftests for it.
 
  - Extend netkit with an option to delegate skb->{mark,priority}
    scrubbing to its BPF program.
 
  - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
    programs.
 
 Protocols
 ---------
 
  - Introduces 4-tuple hash for connected udp sockets, speeding-up
    significantly connected sockets lookup.
 
  - Add a fastpath for some TCP timers that usually expires after close,
    the socket lock contention.
 
  - Add inbound and outbound xfrm state caches to speed up state lookups.
 
  - Avoid sending MPTCP advertisements on stale subflows, reducing
    risks on loosing them.
 
  - Make neighbours table flushing more scalable, maintaining per device
    neigh lists.
 
 Driver API
 ----------
 
  - Introduce a unified interface to configure transmission H/W shaping,
    and expose it to user-space via generic-netlink.
 
  - Add support for per-NAPI config via netlink. This makes napi
    configuration persistent across queues removal and re-creation.
    Requires driver updates, currently supported drivers are:
    nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
 
  - Add ethtool support for writing SFP / PHY firmware blocks.
 
  - Track RSS context allocation from ethtool core.
 
  - Implement support for mirroring to DSA CPU port, via TC mirror
    offload.
 
  - Consolidate FDB updates notification, to avoid duplicates on
    device-specific entries.
 
  - Expose DPLL clock quality level to the user-space.
 
  - Support master-slave PHY config via device tree.
 
 Tests and tooling
 -----------------
 
  - forwarding: introduce deferred commands, to simplify
    the cleanup phase
 
 Drivers
 -------
 
  - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
    Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
    IRQs and queues to NAPI IDs, allowing busy polling and better
    introspection.
 
  - Ethernet high-speed NICs:
    - nVidia/Mellanox:
      - mlx5:
        - a large refactor to implement support for cross E-Switch
          scheduling
        - refactor H/W conter management to let it scale better
        - H/W GRO cleanups
    - Intel (100G, ice)::
      - adds support for ethtool reset
      - implement support for per TX queue H/W shaping
    - AMD/Solarflare:
      - implement per device queue stats support
    - Broadcom (bnxt):
      - improve wildcard l4proto on IPv4/IPv6 ntuple rules
    - Marvell Octeon:
      - Adds representor support for each Resource Virtualization Unit
        (RVU) device.
    - Hisilicon:
      - adds support for the BMC Gigabit Ethernet
    - IBM (EMAC):
      - driver cleanup and modernization
    - Cisco (VIC):
      - raise the queues number limit to 256
 
  - Ethernet virtual:
    - Google vNIC:
      - implements page pool support
    - macsec:
      - inherit lower device's features and TSO limits when offloading
    - virtio_net:
      - enable premapped mode by default
      - support for XDP socket(AF_XDP) zerocopy TX
    - wireguard:
      - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
        packets.
 
  - Ethernet NICs embedded and virtual:
    - Broadcom ASP:
      - enable software timestamping
    - Freescale:
      - add enetc4 PF driver
    - MediaTek: Airoha SoC:
      - implement BQL support
    - RealTek r8169:
      - enable TSO by default on r8168/r8125
      - implement extended ethtool stats
    - Renesas AVB:
      - enable TX checksum offload
    - Synopsys (stmmac):
      - support header splitting for vlan tagged packets
      - move common code for DWMAC4 and DWXGMAC into a separate FPE
        module.
      - Add the dwmac driver support for T-HEAD TH1520 SoC
    - Synopsys (xpcs):
      - driver refactor and cleanup
    - TI:
      - icssg_prueth: add VLAN offload support
    - Xilinx emaclite:
      - adds clock support
 
  - Ethernet switches:
    - Microchip:
      - implement support for the lan969x Ethernet switch family
      - add LAN9646 switch support to KSZ DSA driver
 
  - Ethernet PHYs:
    - Marvel: 88q2x: enable auto negotiation
    - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
 
  - PTP:
    - Add support for the Amazon virtual clock device
    - Add PtP driver for s390 clocks
 
  - WiFi:
    - mac80211
      - EHT 1024 aggregation size for transmissions
      - new operation to indicate that a new interface is to be added
      - support radio separation of multi-band devices
      - move wireless extension spy implementation to libiw
    - Broadcom:
      - brcmfmac: optional LPO clock support
    - Microchip:
      - add support for Atmel WILC3000
    - Qualcomm (ath12k):
      - firmware coredump collection support
      - add debugfs support for a multitude of statistics
    - Qualcomm (ath5k):
      -  Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
    - Realtek:
      - rtw88: 8821au and 8812au USB adapters support
      - rtw89: add thermal protection
      - rtw89: fine tune BT-coexsitence to improve user experience
      - rtw89: firmware secure boot for WiFi 6 chip
 
  - Bluetooth
      - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
        0x13d3:0x3623
      - add Realtek RTL8852BE support for id Foxconn 0xe123
      - add MediaTek MT7920 support for wireless module ids
      - btintel_pcie: add handshake between driver and firmware
      - btintel_pcie: add recovery mechanism
      - btnxpuart: add GPIO support to power save feature
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmc8sukSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkLEYQAIMM6Qjh0bh3Byr3gOS1xZzXG+APLjP4
 9Jr0p3i+X53i90jvVqzeVO5FTc95MVHSKZ3kvPkDMXSLUaEJxocNHCI5Dzl/2/qL
 wWdpUB6/ou+jKB4Bn6Z8OvVODT7qrr0tVa9M2/fuKWrIsOU/ntIhG8EhnGddk5U/
 vKPSf5PUIb81uNRnF58VusY3wrT1dEoh9VfJYxL+ST+inPxjEAMy6Y+lmlsjGaSX
 jrS+Pp9KYiUwl3Qt0AQs+cG4OHkJdjbnChrfosWwpkiyddO8klVq06+wX/TiSzfF
 b9VZtBfy/GZs3lkE1mQkcILdtX5pP3YHQdpsuxFfVI0JHVszx2ck7WdoRux/8F0v
 kKZsYcO7bH9I1wMFP66Ff9hIbdEQaeucK+KdDkXyPNMfP91Vzmfjii8IBxOC36Ie
 BbOeFUrXyTxxJ2u0vf/X9JtIq8bcrkNrSd1n1jlGPMqG3FVzsY95+Oi4qfsyeUbl
 lS1PlVTqPMPFdX54HnxM3y2rJjhd7iXhkvmtuXNjRFThXlOiK3maAPWlM1aZ3b8u
 Vjs4JFUsW0tleZG+RzANjsGjXbf7AiPUGLZt+acem0K+fcjG4i5aGIAJrxwa/ORx
 eG74IZRt5cOI371W7gNLGHjwnuge8tFPgOWcRP2eozNm7jvMYALBejYS7eWUTvaf
 THcvVM+bupEZ
 =GzPr
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "The most significant set of changes is the per netns RTNL. The new
  behavior is disabled by default, regression risk should be contained.

  Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
  default value from PTP_1588_CLOCK_KVM, as the first is intended to be
  a more reliable replacement for the latter.

  Core:

   - Started a very large, in-progress, effort to make the RTNL lock
     scope per network-namespace, thus reducing the lock contention
     significantly in the containerized use-case, comprising:
       - RCU-ified some relevant slices of the FIB control path
       - introduce basic per netns locking helpers
       - namespacified the IPv4 address hash table
       - remove rtnl_register{,_module}() in favour of
         rtnl_register_many()
       - refactor rtnl_{new,del,set}link() moving as much validation as
         possible out of RTNL lock
       - convert all phonet doit() and dumpit() handlers to RCU
       - convert IPv4 addresses manipulation to per-netns RTNL
       - convert virtual interface creation to per-netns RTNL
     the per-netns lock infrastructure is guarded by the
     CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.

   - Introduce NAPI suspension, to efficiently switching between busy
     polling (NAPI processing suspended) and normal processing.

   - Migrate the IPv4 routing input, output and control path from direct
     ToS usage to DSCP macros. This is a work in progress to make ECN
     handling consistent and reliable.

   - Add drop reasons support to the IPv4 rotue input path, allowing
     better introspection in case of packets drop.

   - Make FIB seqnum lockless, dropping RTNL protection for read access.

   - Make inet{,v6} addresses hashing less predicable.

   - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
     and timestamps

  Things we sprinkled into general kernel code:

   - Add small file operations for debugfs, to reduce the struct ops
     size.

   - Refactoring and optimization for the implementation of page_frag
     API, This is a preparatory work to consolidate the page_frag
     implementation.

  Netfilter:

   - Optimize set element transactions to reduce memory consumption

   - Extended netlink error reporting for attribute parser failure.

   - Make legacy xtables configs user selectable, giving users the
     option to configure iptables without enabling any other config.

   - Address a lot of false-positive RCU issues, pointed by recent CI
     improvements.

  BPF:

   - Put xsk sockets on a struct diet and add various cleanups. Overall,
     this helps to bump performance by 12% for some workloads.

   - Extend BPF selftests to increase coverage of XDP features in
     combination with BPF cpumap.

   - Optimize and homogenize bpf_csum_diff helper for all archs and also
     add a batch of new BPF selftests for it.

   - Extend netkit with an option to delegate skb->{mark,priority}
     scrubbing to its BPF program.

   - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
     programs.

  Protocols:

   - Introduces 4-tuple hash for connected udp sockets, speeding-up
     significantly connected sockets lookup.

   - Add a fastpath for some TCP timers that usually expires after
     close, the socket lock contention.

   - Add inbound and outbound xfrm state caches to speed up state
     lookups.

   - Avoid sending MPTCP advertisements on stale subflows, reducing
     risks on loosing them.

   - Make neighbours table flushing more scalable, maintaining per
     device neigh lists.

  Driver API:

   - Introduce a unified interface to configure transmission H/W
     shaping, and expose it to user-space via generic-netlink.

   - Add support for per-NAPI config via netlink. This makes napi
     configuration persistent across queues removal and re-creation.
     Requires driver updates, currently supported drivers are:
     nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.

   - Add ethtool support for writing SFP / PHY firmware blocks.

   - Track RSS context allocation from ethtool core.

   - Implement support for mirroring to DSA CPU port, via TC mirror
     offload.

   - Consolidate FDB updates notification, to avoid duplicates on
     device-specific entries.

   - Expose DPLL clock quality level to the user-space.

   - Support master-slave PHY config via device tree.

  Tests and tooling:

   - forwarding: introduce deferred commands, to simplify the cleanup
     phase

  Drivers:

   - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
     Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
     IRQs and queues to NAPI IDs, allowing busy polling and better
     introspection.

   - Ethernet high-speed NICs:
      - nVidia/Mellanox:
         - mlx5:
           - a large refactor to implement support for cross E-Switch
             scheduling
           - refactor H/W conter management to let it scale better
           - H/W GRO cleanups
      - Intel (100G, ice)::
         - add support for ethtool reset
         - implement support for per TX queue H/W shaping
      - AMD/Solarflare:
         - implement per device queue stats support
      - Broadcom (bnxt):
         - improve wildcard l4proto on IPv4/IPv6 ntuple rules
      - Marvell Octeon:
         - Add representor support for each Resource Virtualization Unit
           (RVU) device.
      - Hisilicon:
         - add support for the BMC Gigabit Ethernet
      - IBM (EMAC):
         - driver cleanup and modernization
      - Cisco (VIC):
         - raise the queues number limit to 256

   - Ethernet virtual:
      - Google vNIC:
         - implement page pool support
      - macsec:
         - inherit lower device's features and TSO limits when
           offloading
      - virtio_net:
         - enable premapped mode by default
         - support for XDP socket(AF_XDP) zerocopy TX
      - wireguard:
         - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
           packets.

   - Ethernet NICs embedded and virtual:
      - Broadcom ASP:
         - enable software timestamping
      - Freescale:
         - add enetc4 PF driver
      - MediaTek: Airoha SoC:
         - implement BQL support
      - RealTek r8169:
         - enable TSO by default on r8168/r8125
         - implement extended ethtool stats
      - Renesas AVB:
         - enable TX checksum offload
      - Synopsys (stmmac):
         - support header splitting for vlan tagged packets
         - move common code for DWMAC4 and DWXGMAC into a separate FPE
           module.
         - add dwmac driver support for T-HEAD TH1520 SoC
      - Synopsys (xpcs):
         - driver refactor and cleanup
      - TI:
         - icssg_prueth: add VLAN offload support
      - Xilinx emaclite:
         - add clock support

   - Ethernet switches:
      - Microchip:
         - implement support for the lan969x Ethernet switch family
         - add LAN9646 switch support to KSZ DSA driver

   - Ethernet PHYs:
      - Marvel: 88q2x: enable auto negotiation
      - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2

   - PTP:
      - Add support for the Amazon virtual clock device
      - Add PtP driver for s390 clocks

   - WiFi:
      - mac80211
         - EHT 1024 aggregation size for transmissions
         - new operation to indicate that a new interface is to be added
         - support radio separation of multi-band devices
         - move wireless extension spy implementation to libiw
      - Broadcom:
         - brcmfmac: optional LPO clock support
      - Microchip:
         - add support for Atmel WILC3000
      - Qualcomm (ath12k):
         - firmware coredump collection support
         - add debugfs support for a multitude of statistics
      - Qualcomm (ath5k):
         -  Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
      - Realtek:
         - rtw88: 8821au and 8812au USB adapters support
         - rtw89: add thermal protection
         - rtw89: fine tune BT-coexsitence to improve user experience
         - rtw89: firmware secure boot for WiFi 6 chip

   - Bluetooth
      - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
        0x13d3:0x3623
      - add Realtek RTL8852BE support for id Foxconn 0xe123
      - add MediaTek MT7920 support for wireless module ids
      - btintel_pcie: add handshake between driver and firmware
      - btintel_pcie: add recovery mechanism
      - btnxpuart: add GPIO support to power save feature"

* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
  mm: page_frag: fix a compile error when kernel is not compiled
  Documentation: tipc: fix formatting issue in tipc.rst
  selftests: nic_performance: Add selftest for performance of NIC driver
  selftests: nic_link_layer: Add selftest case for speed and duplex states
  selftests: nic_link_layer: Add link layer selftest for NIC driver
  bnxt_en: Add FW trace coredump segments to the coredump
  bnxt_en: Add a new ethtool -W dump flag
  bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
  bnxt_en: Add functions to copy host context memory
  bnxt_en: Do not free FW log context memory
  bnxt_en: Manage the FW trace context memory
  bnxt_en: Allocate backing store memory for FW trace logs
  bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
  bnxt_en: Refactor bnxt_free_ctx_mem()
  bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
  bnxt_en: Update firmware interface spec to 1.10.3.85
  selftests/bpf: Add some tests with sockmap SK_PASS
  bpf: fix recursive lock when verdict program return SK_PASS
  wireguard: device: support big tcp GSO
  wireguard: selftests: load nf_conntrack if not present
  ...
2024-11-21 08:28:08 -08:00
Linus Torvalds 7d75606665 pwm: Changes for v6.13-rc1
This pull request prominently contains a new abstraction for PWM
 waveforms that is more expressive that the legacy one. Compared to the
 old abstraction it contains a duty_offset member instead of polarity.
 This new abstraction is already used in an ADC driver merged into the
 iio tree. So I expect you will get a part of this tree also via the iio
 pull request for 6.13-rc1 (tag pwm/duty_offset-for-6.13-rc1).
 
 Otherwise it's the usual collection of fixes, cleanups and dt doc
 updates.
 
 This time around thanks go to Andy Shevchenko, Clark Wang, Conor Dooley,
 David Lechner, Dimitri Fedrau, Frank Li, Jun Li, Kelvin Zhang, Krzysztof
 Kozlowski, Nuno Sa, Shen Lichuan and Trevor Gamblin for code
 contributions, testing and review.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEP4GsaTp6HlmJrf7Tj4D7WH0S/k4FAmc7DoAACgkQj4D7WH0S
 /k5tEgf/QJY8mAPPZR45dDo+GdUnZAaV85sseOezeApjB6kMYXKsKWoDC0uQ9m40
 t7zkR8rXCk84rYSg4fGpcWL12v03n8cXmABkJUsqkUkLCcU/pifKzxanC25IWMH1
 DGCW8tev4/NSe2ud9kLmFR/p85aioIW47Az3QH096Wv+Y5ij3v5e8PHBIaSiWHlb
 gfQI1XWerHSbAZexF132zGZOD/TBWb6djAQKACh5KWBPWB54zK3n3ngxoOCSMKSh
 Li8nfVyy32mPurLfTqaTaAHg7uGrcCGOVhqnXSQiuUayMlV/T7FX/uwfF/X/YKFm
 iqIPoYeUhLHmHJkHLACtPzUajkTJbg==
 =3d9l
 -----END PGP SIGNATURE-----

Merge tag 'pwm/for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux

Pull pwm updates from Uwe Kleine-König:
 "This contains a new abstraction for PWM waveforms that is more
  expressive that the legacy one. Compared to the old abstraction it
  contains a duty_offset member instead of polarity. This new
  abstraction is already used in an ADC driver merged into the iio tree.

  The new API requires changes to the lowlevel drivers. For now there
  are two drivers that are converted to the new API (axi-pwmgen and
  stm32). Converted drivers continue to work with the old API. Drivers
  not yet converted only work with the older API.

  Otherwise it's the usual collection of fixes, cleanups and dt doc
  updates.

  This time around thanks go to Andy Shevchenko, Clark Wang, Conor
  Dooley, David Lechner, Dimitri Fedrau, Frank Li, Jun Li, Kelvin Zhang,
  Krzysztof Kozlowski, Nuno Sa, Shen Lichuan and Trevor Gamblin for code
  contributions, testing and review"

* tag 'pwm/for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
  pwm: Assume a disabled PWM to emit a constant inactive output
  pwm: core: export pwm_get_state_hw()
  pwm: core: use device_match_name() instead of strcmp(dev_name(...
  dt-bindings: pwm: adi,axi-pwmgen: Increase #pwm-cells to 3
  pwm: imx27: Use clk_bulk_*() API to simplify clock handling
  pwm: imx27: Workaround of the pwm output bug when decrease the duty cycle
  pwm: axi-pwmgen: Enable FORCE_ALIGN by default
  pwm: axi-pwmgen: Rename 0x10 register
  dt-bindings: pwm: amlogic: Document C3 PWM
  pwm: axi-pwmgen: Create a dedicated function for getting driver data from a chip
  pwm: atmel-tcb: Use min() macro
  pwm: stm32: Fix error checking for a regmap_read() call
  pwm: Add kernel doc for members added to pwm_ops recently
  pwm: Reorder symbols in core.c
  pwm: stm32: Implementation of the waveform callbacks
  pwm: axi-pwmgen: Implementation of the waveform callbacks
  pwm: Add tracing for waveform callbacks
  pwm: Provide new consumer API functions for waveforms
  pwm: New abstraction for PWM waveforms
  pwm: Add more locking
2024-11-20 12:34:22 -08:00
Linus Torvalds c1f2ffe207 - Log and handle twp new AMD-specific MCA registers: SYND1 and SYND2 and
report the Field Replaceable Unit text info reported through them
 
 - Add support for handling variable-sized SMCA BERT records
 
 - Add the capability for reporting vendor-specific RAS error info without
   adding vendor-specific fields to struct mce
 
 - Cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc7OlEACgkQEsHwGGHe
 VUpXihAAgVdZExo/1Rmbh6s/259BH38GP6fL+ePaT1SlUzNi770TY2b7I4OYlms4
 xa9t8LAIVMrrIMIg6w6q8JN4YHAQoVdcbRBvHQYB1a24xtoyxaEJxLKQNLA1soUQ
 Jc9asWMHBuXnLfR/4S8Y2vWrzByOSwxqDBzQCu0Ryqvbg7vdRicNt+Hk9oHHIAYy
 cquZpoDGL3W6BA8sXONbEW/6rcQ33JsEQ+Ub4qr1q2g+kNwXrrFuXZlojmz2MxIs
 xgqeYKyrxK6heX0l8dSiipCATA+sOXXWWzbZtdPjFtDGzwIlV3p4yXN3fucrmHm1
 4Fg1gW5a1V82Qosn0FbGiZPojsahhOE2k1bz+yEMDM3Sg2qeRWcK+V3jiS5zKzPd
 WWqUbRtcaxayoEsAXnWrxrp3vxhlUUf1Ivtgk8mlMjhHPLijV5iranrRj+XHEikR
 H0D3Vm0T1LHCPf9AUsbmo0GAfAOeO9DTAB9LJdKv+OJ4ESVgSPJW/9NKWLXKq41p
 hhs7seJTYNw8sp67cL23TnkSp3S+9kd2U7Od3T1kubtd4fVxVnlowu8Fc6kjqd8v
 n+GbdLxhX7GbOgnT0z2OG5Xmc1pNW1JtRbuxSK59NFNia7r6ZkR7BE/OCtL82Rfm
 u7i76z1O0lV91y93GMCyP9DYn8K1ceU7gVCveY6mx/AHgzc87d8=
 =djpG
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Log and handle twp new AMD-specific MCA registers: SYND1 and SYND2
   and report the Field Replaceable Unit text info reported through them

 - Add support for handling variable-sized SMCA BERT records

 - Add the capability for reporting vendor-specific RAS error info
   without adding vendor-specific fields to struct mce

 - Cleanups

* tag 'ras_core_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/mce_amd: Add support for FRU text in MCA
  x86/mce/apei: Handle variable SMCA BERT record size
  x86/MCE/AMD: Add support for new MCA_SYND{1,2} registers
  tracing: Add __print_dynamic_array() helper
  x86/mce: Add wrapper for struct mce to export vendor specific info
  x86/mce/intel: Use MCG_BANKCNT_MASK instead of 0xff
  x86/mce/mcelog: Use xchg() to get and clear the flags
2024-11-19 12:04:51 -08:00
Linus Torvalds 8350142a4b for-6.13/io_uring-20241118
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmc7S3kQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjHVEAC+CITBEcGy+S0IK0BpIAhuA+A621LtqBwy
 0z/4MZKXMqvWxcFGQJ9Zr8MvxUnY4KFcssiaR5zk+I9TczNu7mLMuPYD1Gb0Klgz
 mwuFOylo1CAAC41IABYZZ/0qWbTaW0p8tpaGsTbTNk3tBxuMLB550+APAqC1OE9U
 bb7rP+FHc5+YGI9/7JNWt7NNTSHvVSO6oxjltCxHr1dRg93Jtr2jaY6letY3epFz
 TCFyfJlDtK8fPwtYRyG51M4g2Vdp9/4qsfPqvnXwUr9MdWaVh5/TFkyvqDi5sCKM
 zdK/sjRiimYzvqqKg6bzgYscITUPNk2TG6ZJq5U1L7lrglzVY69c7GIUnNzPrL/y
 AxQsR5Guxz3bRNYWZ4BKJDH+NNB+cgIFEXDsv72qoUy3HTzA6wOPZYxfjhZhKuG/
 DjRwM7NGx5oPiKtpK99IulZttXdmtkH0csuLwKmOzrQskQdTuWyrEtU7UQql7oQ5
 Rt3DhMXouzYZMicB8U5Q9gO2I3WN+2VVxXl4sa00LG8KsT6PzLnz4Q2k/1c83S6J
 rRivRbZAbZ1+BqKvF8T7GgzLCeaLgzbeoxmxj6xr87pf3SYEs2KhQeQ+n/C0HTOt
 GOcG1+bvh7t2aSvlBPKVCExWI4erwG6wXFhfGKsLW9CmwIMqRNxdePpRWe3Cueyp
 M3QRJuvTxQ==
 =bDvp
 -----END PGP SIGNATURE-----

Merge tag 'for-6.13/io_uring-20241118' of git://git.kernel.dk/linux

Pull io_uring updates from Jens Axboe:

 - Cleanups of the eventfd handling code, making it fully private.

 - Support for sending a sync message to another ring, without having a
   ring available to send a normal async message.

 - Get rid of the separate unlocked hash table, unify everything around
   the single locked one.

 - Add support for ring resizing. It can be hard to appropriately size
   the CQ ring upfront, if the application doesn't know how busy it will
   be. This results in applications sizing rings for the most busy case,
   which can be wasteful. With ring resizing, they can start small and
   grow the ring, if needed.

 - Add support for fixed wait regions, rather than needing to copy the
   same wait data tons of times for each wait operation.

 - Rewrite the resource node handling, which before was serialized per
   ring. This caused issues with particularly fixed files, where one
   file waiting on IO could hold up putting and freeing of other
   unrelated files. Now each node is handled separately. New code is
   much simpler too, and was a net 250 line reduction in code.

 - Add support for just doing partial buffer clones, rather than always
   cloning the entire buffer table.

 - Series adding static NAPI support, where a specific NAPI instance is
   used rather than having a list of them available that need lookup.

 - Add support for mapped regions, and also convert the fixed wait
   support mentioned above to that concept. This avoids doing special
   mappings for various planned features, and folds the existing
   registered wait into that too.

 - Add support for hybrid IO polling, which is a variant of strict
   IOPOLL but with an initial sleep delay to avoid spinning too early
   and wasting resources on devices that aren't necessarily in the < 5
   usec category wrt latencies.

 - Various cleanups and little fixes.

* tag 'for-6.13/io_uring-20241118' of git://git.kernel.dk/linux: (79 commits)
  io_uring/region: fix error codes after failed vmap
  io_uring: restore back registered wait arguments
  io_uring: add memory region registration
  io_uring: introduce concept of memory regions
  io_uring: temporarily disable registered waits
  io_uring: disable ENTER_EXT_ARG_REG for IOPOLL
  io_uring: fortify io_pin_pages with a warning
  switch io_msg_ring() to CLASS(fd)
  io_uring: fix invalid hybrid polling ctx leaks
  io_uring/uring_cmd: fix buffer index retrieval
  io_uring/rsrc: add & apply io_req_assign_buf_node()
  io_uring/rsrc: remove '->ctx_ptr' of 'struct io_rsrc_node'
  io_uring/rsrc: pass 'struct io_ring_ctx' reference to rsrc helpers
  io_uring: avoid normal tw intermediate fallback
  io_uring/napi: add static napi tracking strategy
  io_uring/napi: clean up __io_napi_do_busy_loop
  io_uring/napi: Use lock guards
  io_uring/napi: improve __io_napi_add
  io_uring/napi: fix io_napi_entry RCU accesses
  io_uring/napi: protect concurrent io_napi_entry timeout accesses
  ...
2024-11-18 17:02:57 -08:00
Linus Torvalds 77a0cfafa9 for-6.13/block-20241118
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmc7S40QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjHVD/43rDZ8ehs+IAAr6S0RemNX1SRG0mK2UOEb
 kMoNogS7StO/c4JYW3JuzCyLRn5ZsgeWV/muqxwDEWQrmTGrvi+V45KikrZPwm3k
 p0ump33qV9EU2jiR1MKZjtwK2P0CI7/DD3W8ww6IOvKbTT7RcqQcdHznvXArFBtc
 xCuQPpayFG7ZasC+N9VaBwtiUEVgU3Ek9AFT7UVZRWajjHPNalQwaooJWayO0rEG
 KdoW5yG0ryLrgCY2ACSvRLS+2s14EJtb8hgT08WKHTNgd5LxhSKxfsTapamua+7U
 FdVS6Ij0tEkgu2jpvgj7QKO0Uw10Cnep2gj7RHts/LVewvkliS6XcheOzqRS1jWU
 I2EI+UaGOZ11OUiw52VIveEVS5zV/NWhgy5BSP9LYEvXw0BUAHRDYGMem8o5G1V1
 SWqjIM1UWvcQDlAnMF9FDVzojvjVUmYWvcAlFFztO8J0B7SavHR3NcfHwEf57reH
 rNoUbi/9c4/wjJJF33gejiR5pU+ewy/Mk75GrtX3xpEqlztfRbf9/FbPCMEAO1KR
 DF/b3lkUV9i2/BRW6a0SpZ5RDSmSYMnateel6TrPyVSRnpiSSFO8FrbynwUOa17b
 6i49YDFWzzXOrR1YWDg6IEtTrcmBEmvi7F6aoDs020qUnL0hwLn1ZuoIxuiFEpor
 Z0iFF1B/nw==
 =PWTH
 -----END PGP SIGNATURE-----

Merge tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe updates via Keith:
      - Use uring_cmd helper (Pavel)
      - Host Memory Buffer allocation enhancements (Christoph)
      - Target persistent reservation support (Guixin)
      - Persistent reservation tracing (Guixen)
      - NVMe 2.1 specification support (Keith)
      - Rotational Meta Support (Matias, Wang, Keith)
      - Volatile cache detection enhancment (Guixen)

 - MD updates via Song:
      - Maintainers update
      - raid5 sync IO fix
      - Enhance handling of faulty and blocked devices
      - raid5-ppl atomic improvement
      - md-bitmap fix

 - Support for manually defining embedded partition tables

 - Zone append fixes and cleanups

 - Stop sending the queued requests in the plug list to the driver
   ->queue_rqs() handle in reverse order.

 - Zoned write plug cleanups

 - Cleanups disk stats tracking and add support for disk stats for
   passthrough IO

 - Add preparatory support for file system atomic writes

 - Add lockdep support for queue freezing. Already found a bunch of
   issues, and some fixes for that are in here. More will be coming.

 - Fix race between queue stopping/quiescing and IO queueing

 - ublk recovery improvements

 - Fix ublk mmap for 64k pages

 - Various fixes and cleanups

* tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux: (118 commits)
  MAINTAINERS: Update git tree for mdraid subsystem
  block: make struct rq_list available for !CONFIG_BLOCK
  block/genhd: use seq_put_decimal_ull for diskstats decimal values
  block: don't reorder requests in blk_mq_add_to_batch
  block: don't reorder requests in blk_add_rq_to_plug
  block: add a rq_list type
  block: remove rq_list_move
  virtio_blk: reverse request order in virtio_queue_rqs
  nvme-pci: reverse request order in nvme_queue_rqs
  btrfs: validate queue limits
  block: export blk_validate_limits
  nvmet: add tracing of reservation commands
  nvme: parse reservation commands's action and rtype to string
  nvmet: report ns's vwc not present
  md/raid5: Increase r5conf.cache_name size
  block: remove the ioprio field from struct request
  block: remove the write_hint field from struct request
  nvme: check ns's volatile write cache not present
  nvme: add rotational support
  nvme: use command set independent id ns if available
  ...
2024-11-18 16:50:08 -08:00
Linus Torvalds c14a8a4c04 for-6.13-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmc0zT4ACgkQxWXV+ddt
 WDtThRAAhzSSiHcJqTfCL5nHh7w85MNEVw28o1ETgXSYJmx0JOWLE7Znlp2FV7jj
 IbYkFfF2gXJzYvRZkcXB/TAHV9KJG5yZIBZfccbM+9db9f8xkImVKMuqQRXPU41R
 ppSCmqZTeujtt8ucsaJkMpm6pzECKJCJaGOsMJ8fiqKpo89dKO3eGAVboSbpPF4C
 r0YmppiBwSP/cCXQCqWxZRbqPGN+lUgZpIGNRi157kehfmRHlVVJTO1pgqK8PCXb
 uIT09Kulppfez8+1A10CPcniDTyinLik/qLTNlzdWoDBL4iNJMg0A0wsA04AJVf0
 PdOS0REusiv3QcEIO6PefuRFRRfXcSLPpPDUceltJT5O0uM2gUqf2C7dEHXUGU3o
 TdgYlbQpsJWpZ7VGWQDZeGGV04lOPQvu0LGLPgEerUQd5H9ABa0dX8Fn0sPhKsa8
 whpAcdfE4rdNxB2OJFnqQeFq0z3cSjP/rvKlluCmAj97QYI+kiu3QyhemcT1YSC9
 U7n5Ya9IzIYCN3ml54q3hEgyD0IVGGG20GuUmqC9XSP9mrQRC8I1g7v26AiOTrrk
 VhgSdtMmphDxXudifsnYMaQ0Z1QqiUrW1SM/prAEOnBYCo75+HDsTgrq9ithgHoI
 4xz4YXJyMRs18qfTJctXC1wmGuz5plTdQrwarHdNsELN5HEyqX4=
 =aAcf
 -----END PGP SIGNATURE-----

Merge tag 'for-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "Changes outside of btrfs: add io_uring command flag to track a dying
  task (the rest will go via the block git tree).

  User visible changes:

   - wire encoded read (ioctl) to io_uring commands, this can be used on
     itself, in the future this will allow 'send' to be asynchronous. As
     a consequence, the encoded read ioctl can also work in non-blocking
     mode

   - new ioctl to wait for cleaned subvolumes, no need to use the
     generic and root-only SEARCH_TREE ioctl, will be used by "btrfs
     subvol sync"

   - recognize different paths/symlinks for the same devices and don't
     report them during rescanning, this can be observed with LVM or DM

   - seeding device use case change, the sprout device (the one
     capturing new writes) will not clear the read-only status of the
     super block; this prevents accumulating space from deleted
     snapshots

  Performance improvements:

   - reduce lock contention when traversing extent buffers

   - reduce extent tree lock contention when searching for inline
     backref

   - switch from rb-trees to xarray for delayed ref tracking,
     improvements due to better cache locality, branching factors and
     more compact data structures

   - enable extent map shrinker again (prevent memory exhaustion under
     some types of IO load), reworked to run in a single worker thread
     (there used to be problems causing long stalls under memory
     pressure)

  Core changes:

   - raid-stripe-tree feature updates:
       - make device replace and scrub work
       - implement partial deletion of stripe extents
       - new selftests

   - split the config option BTRFS_DEBUG and add EXPERIMENTAL for
     features that are experimental or with known problems so we don't
     misuse debugging config for that

   - subpage mode updates (sector < page):
       - update compression implementations
       - update writepage, writeback

   - continued folio API conversions:
       - buffered writes

   - make buffered write copy one page at a time, preparatory work for
     future integration with large folios, may cause performance drop

   - proper locking of root item regarding starting send

   - error handling improvements

   - code cleanups and refactoring:
       - dead code removal
       - unused parameter reduction
       - lockdep assertions"

* tag 'for-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (119 commits)
  btrfs: send: check for read-only send root under critical section
  btrfs: send: check for dead send root under critical section
  btrfs: remove check for NULL fs_info at btrfs_folio_end_lock_bitmap()
  btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_ioctl()
  btrfs: fix a typo in btrfs_use_zone_append
  btrfs: avoid superfluous calls to free_extent_map() in btrfs_encoded_read()
  btrfs: simplify logic to decrement snapshot counter at btrfs_mksnapshot()
  btrfs: remove hole from struct btrfs_delayed_node
  btrfs: update stale comment for struct btrfs_delayed_ref_node::add_list
  btrfs: add new ioctl to wait for cleaned subvolumes
  btrfs: simplify range tracking in cow_file_range()
  btrfs: remove conditional path allocation in btrfs_read_locked_inode()
  btrfs: push cleanup into btrfs_read_locked_inode()
  io_uring/cmd: let cmds to know about dying task
  btrfs: add struct io_btrfs_cmd as type for io_uring_cmd_to_pdu()
  btrfs: add io_uring command for encoded reads (ENCODED_READ ioctl)
  btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages()
  btrfs: don't sleep in btrfs_encoded_read() if IOCB_NOWAIT is set
  btrfs: change btrfs_encoded_read() so that reading of extent is done by caller
  btrfs: remove pointless iocb::ki_pos addition in btrfs_encoded_read()
  ...
2024-11-18 16:37:41 -08:00
Linus Torvalds 8dcf44fcad vfs-6.13.netfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcUoQAKCRCRxhvAZXjc
 omxAAP9WE8zSxeu7Foa6+OmLO62mYdB8rRrQ4OjzX+zunL0UnAD9FAHPsB4amWm4
 /zK3Nf7ipijop5+RgSJTgURffASKOgI=
 =ifkk
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull netfs updates from Christian Brauner:
 "Various fixes for the netfs library and related infrastructure:

  cachefiles:

   - Fix a dentry leak in cachefiles_open_file()

   - Fix incorrect length return value in
     cachefiles_ondemand_fd_write_iter()

   - Fix missing pos updates in cachefiles_ondemand_fd_write_iter()

   - Clean up in cachefiles_commit_tmpfile()

   - Fix NULL pointer dereference in object->file

   - Add a memory barrier for FSCACHE_VOLUME_CREATING

  netfs:

   - Remove call to folio_index()

   - Fix a few minor bugs in netfs_page_mkwrite()

   - Remove unnecessary references to pages"

* tag 'vfs-6.13.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs/fscache: Add a memory barrier for FSCACHE_VOLUME_CREATING
  cachefiles: Fix NULL pointer dereference in object->file
  cachefiles: Clean up in cachefiles_commit_tmpfile()
  cachefiles: Fix missing pos updates in cachefiles_ondemand_fd_write_iter()
  cachefiles: Fix incorrect length return value in cachefiles_ondemand_fd_write_iter()
  netfs: Remove unnecessary references to pages
  netfs: Fix a few minor bugs in netfs_page_mkwrite()
  netfs: Remove call to folio_index()
2024-11-18 10:26:49 -08:00
Linus Torvalds 70e7730c2a vfs-6.13.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcToAAKCRCRxhvAZXjc
 osL9AP948FFumJRC28gDJ4xp+X4eohNOfkgoEG8FTbF2zU6ulwD+O0pr26FqpFli
 pqlG+38UdATImpfqqWjPbb72sBYcfQg=
 =wLUh
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:

   - Fixup and improve NLM and kNFSD file lock callbacks

     Last year both GFS2 and OCFS2 had some work done to make their
     locking more robust when exported over NFS. Unfortunately, part of
     that work caused both NLM (for NFS v3 exports) and kNFSD (for
     NFSv4.1+ exports) to no longer send lock notifications to clients

     This in itself is not a huge problem because most NFS clients will
     still poll the server in order to acquire a conflicted lock

     It's important for NLM and kNFSD that they do not block their
     kernel threads inside filesystem's file_lock implementations
     because that can produce deadlocks. We used to make sure of this by
     only trusting that posix_lock_file() can correctly handle blocking
     lock calls asynchronously, so the lock managers would only setup
     their file_lock requests for async callbacks if the filesystem did
     not define its own lock() file operation

     However, when GFS2 and OCFS2 grew the capability to correctly
     handle blocking lock requests asynchronously, they started
     signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check
     for also trusting posix_lock_file() was inadvertently dropped, so
     now most filesystems no longer produce lock notifications when
     exported over NFS

     Fix this by using an fop_flag which greatly simplifies the problem
     and grooms the way for future uses by both filesystems and lock
     managers alike

   - Add a sysctl to delete the dentry when a file is removed instead of
     making it a negative dentry

     Commit 681ce86235 ("vfs: Delete the associated dentry when
     deleting a file") introduced an unconditional deletion of the
     associated dentry when a file is removed. However, this led to
     performance regressions in specific benchmarks, such as
     ilebench.sum_operations/s, prompting a revert in commit
     4a4be1ad3a ("Revert "vfs: Delete the associated dentry when
     deleting a file""). This reintroduces the concept conditionally
     through a sysctl

   - Expand the statmount() system call:

       * Report the filesystem subtype in a new fs_subtype field to
         e.g., report fuse filesystem subtypes

       * Report the superblock source in a new sb_source field

       * Add a new way to return filesystem specific mount options in an
         option array that returns filesystem specific mount options
         separated by zero bytes and unescaped. This allows caller's to
         retrieve filesystem specific mount options and immediately pass
         them to e.g., fsconfig() without having to unescape or split
         them

       * Report security (LSM) specific mount options in a separate
         security option array. We don't lump them together with
         filesystem specific mount options as security mount options are
         generic and most users aren't interested in them

         The format is the same as for the filesystem specific mount
         option array

   - Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command

   - Optimize acl_permission_check() to avoid costly {g,u}id ownership
     checks if possible

   - Use smp_mb__after_spinlock() to avoid full smp_mb() in evict()

   - Add synchronous wakeup support for ep_poll_callback.

     Currently, epoll only uses wake_up() to wake up task. But sometimes
     there are epoll users which want to use the synchronous wakeup flag
     to give a hint to the scheduler, e.g., the Android binder driver.
     So add a wake_up_sync() define, and use wake_up_sync() when sync is
     true in ep_poll_callback()

  Fixes:

   - Fix kernel documentation for inode_insert5() and iget5_locked()

   - Annotate racy epoll check on file->f_ep

   - Make F_DUPFD_QUERY associative

   - Avoid filename buffer overrun in initramfs

   - Don't let statmount() return empty strings

   - Add a cond_resched() to dump_user_range() to avoid hogging the CPU

   - Don't query the device logical blocksize multiple times for hfsplus

   - Make filemap_read() check that the offset is positive or zero

  Cleanups:

   - Various typo fixes

   - Cleanup wbc_attach_fdatawrite_inode()

   - Add __releases annotation to wbc_attach_and_unlock_inode()

   - Add hugetlbfs tracepoints

   - Fix various vfs kernel doc parameters

   - Remove obsolete TODO comment from io_cancel()

   - Convert wbc_account_cgroup_owner() to take a folio

   - Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add()

   - Reorder struct posix_acl to save 8 bytes

   - Annotate struct posix_acl with __counted_by()

   - Replace one-element array with flexible array member in freevxfs

   - Use idiomatic atomic64_inc_return() in alloc_mnt_ns()"

* tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
  statmount: retrieve security mount options
  vfs: make evict() use smp_mb__after_spinlock instead of smp_mb
  statmount: add flag to retrieve unescaped options
  fs: add the ability for statmount() to report the sb_source
  writeback: wbc_attach_fdatawrite_inode out of line
  writeback: add a __releases annoation to wbc_attach_and_unlock_inode
  fs: add the ability for statmount() to report the fs_subtype
  fs: don't let statmount return empty strings
  fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel()
  hfsplus: don't query the device logical block size multiple times
  freevxfs: Replace one-element array with flexible array member
  fs: optimize acl_permission_check()
  initramfs: avoid filename buffer overrun
  fs/writeback: convert wbc_account_cgroup_owner to take a folio
  acl: Annotate struct posix_acl with __counted_by()
  acl: Realign struct posix_acl to save 8 bytes
  epoll: Add synchronous wakeup support for ep_poll_callback
  coredump: add cond_resched() to dump_user_range
  mm/page-writeback.c: Fix comment of wb_domain_writeout_add()
  mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL
  ...
2024-11-18 09:35:30 -08:00
Linus Torvalds 6ac81fd55e vfs-6.13.mgtime
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcScQAKCRCRxhvAZXjc
 oj+5AP4k822a77wc/3iPFk379naIvQ4dsrgemh0/Pb6ZvzvkFQEAi3vFCfzCDR2x
 SkJF/RwXXKZv6U31QXMRt2Qo6wfBuAc=
 =nVlm
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs multigrain timestamps from Christian Brauner:
 "This is another try at implementing multigrain timestamps. This time
  with significant help from the timekeeping maintainers to reduce the
  performance impact.

  Thomas provided a base branch that contains the required timekeeping
  interfaces for the VFS. It serves as the base for the multi-grain
  timestamp work:

   - Multigrain timestamps allow the kernel to use fine-grained
     timestamps when an inode's attributes is being actively observed
     via ->getattr(). With this support, it's possible for a file to get
     a fine-grained timestamp, and another modified after it to get a
     coarse-grained stamp that is earlier than the fine-grained time. If
     this happens then the files can appear to have been modified in
     reverse order, which breaks VFS ordering guarantees.

     To prevent this, a floor value is maintained for multigrain
     timestamps. Whenever a fine-grained timestamp is handed out, record
     it, and when later coarse-grained stamps are handed out, ensure
     they are not earlier than that value. If the coarse-grained
     timestamp is earlier than the fine-grained floor, return the floor
     value instead.

     The timekeeper changes add a static singleton atomic64_t into
     timekeeper.c that is used to keep track of the latest fine-grained
     time ever handed out. This is tracked as a monotonic ktime_t value
     to ensure that it isn't affected by clock jumps. Because it is
     updated at different times than the rest of the timekeeper object,
     the floor value is managed independently of the timekeeper via a
     cmpxchg() operation, and sits on its own cacheline.

     Two new public timekeeper interfaces are added:

      (1) ktime_get_coarse_real_ts64_mg() fills a timespec64 with the
          later of the coarse-grained clock and the floor time

      (2) ktime_get_real_ts64_mg() gets the fine-grained clock value,
          and tries to swap it into the floor. A timespec64 is filled
          with the result.

   - The VFS has always used coarse-grained timestamps when updating the
     ctime and mtime after a change. This has the benefit of allowing
     filesystems to optimize away a lot metadata updates, down to around
     1 per jiffy, even when a file is under heavy writes.

     Unfortunately, this has always been an issue when we're exporting
     via NFSv3, which relies on timestamps to validate caches. A lot of
     changes can happen in a jiffy, so timestamps aren't sufficient to
     help the client decide when to invalidate the cache. Even with
     NFSv4, a lot of exported filesystems don't properly support a
     change attribute and are subject to the same problems with
     timestamp granularity. Other applications have similar issues with
     timestamps (e.g backup applications).

     If we were to always use fine-grained timestamps, that would
     improve the situation, but that becomes rather expensive, as the
     underlying filesystem would have to log a lot more metadata
     updates.

     This adds a way to only use fine-grained timestamps when they are
     being actively queried. Use the (unused) top bit in
     inode->i_ctime_nsec as a flag that indicates whether the current
     timestamps have been queried via stat() or the like. When it's set,
     we allow the kernel to use a fine-grained timestamp iff it's
     necessary to make the ctime show a different value.

     This solves the problem of being able to distinguish the timestamp
     between updates, but introduces a new problem: it's now possible
     for a file being changed to get a fine-grained timestamp. A file
     that is altered just a bit later can then get a coarse-grained one
     that appears older than the earlier fine-grained time. This
     violates timestamp ordering guarantees.

     This is where the earlier mentioned timkeeping interfaces help. A
     global monotonic atomic64_t value is kept that acts as a timestamp
     floor. When we go to stamp a file, we first get the latter of the
     current floor value and the current coarse-grained time. If the
     inode ctime hasn't been queried then we just attempt to stamp it
     with that value.

     If it has been queried, then first see whether the current coarse
     time is later than the existing ctime. If it is, then we accept
     that value. If it isn't, then we get a fine-grained time and try to
     swap that into the global floor. Whether that succeeds or fails, we
     take the resulting floor time, convert it to realtime and try to
     swap that into the ctime.

     We take the result of the ctime swap whether it succeeds or fails,
     since either is just as valid.

     Filesystems can opt into this by setting the FS_MGTIME fstype flag.
     Others should be unaffected (other than being subject to the same
     floor value as multigrain filesystems)"

* tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: reduce pointer chasing in is_mgtime() test
  tmpfs: add support for multigrain timestamps
  btrfs: convert to multigrain timestamps
  ext4: switch to multigrain timestamps
  xfs: switch to multigrain timestamps
  Documentation: add a new file documenting multigrain timestamps
  fs: add percpu counters for significant multigrain timestamp events
  fs: tracepoints around multigrain timestamp events
  fs: handle delegated timestamps in setattr_copy_mgtime
  timekeeping: Add percpu counter for tracking floor swap events
  timekeeping: Add interfaces for handling timestamps with a floor value
  fs: have setattr_copy handle multigrain timestamps appropriately
  fs: add infrastructure for multigrain timestamps
2024-11-18 09:15:39 -08:00
Xianglai Li 948ccbd950 LoongArch: KVM: Add iocsr and mmio bus simulation in kernel
Add iocsr and mmio memory read and write simulation to the kernel. When
the VM accesses the device address space through iocsr instructions or
mmio, it does not need to return to the qemu user mode but can directly
completes the access in the kernel mode.

Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn>
Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-13 16:18:26 +08:00
Christoph Hellwig 6975c1a486 block: remove the ioprio field from struct request
The request ioprio is only initialized from the first attached bio,
so requests without a bio already never set it.  Directly use the
bio field instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241112170050.1612998-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-12 14:42:02 -07:00
Vlastimil Babka 9b5c87d479 mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
Since 7d6be67cfd ("mm: mmap_lock: replace get_memcg_path_buf() with
on-stack buffer") we use trace_mmap_lock_reg()/unreg() only to maintain an
atomic reg_refcount which is checked to avoid performing
get_mm_memcg_path() in case none of the tracepoints using it is enabled.

This can be achieved directly by putting all the work needed for the
tracepoint behind the trace_mmap_lock_##type##_enabled(), as suggested by
Documentation/trace/tracepoints.rst and with the following advantages:

- uses the tracepoint's static key instead of evaluating a branch

- the check tracepoint specific, not shared by all of them

- we can get rid of trace_mmap_lock_reg()/unreg() completely

Thus use the trace_..._enabled() check and remove unnecessary code.

Link: https://lkml.kernel.org/r/20241105113456.95066-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 17:22:28 -08:00
David Howells 8b9a7bd4d6 rxrpc: Add a tracepoint for aborts being proposed
Add a tracepoint to rxrpc to trace the proposal of an abort.  The abort is
performed asynchronously by the I/O thread.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/726356.1730898045@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11 15:27:46 -08:00
Filipe Manana e7fa845010 btrfs: rename extent map shrinker members from struct btrfs_fs_info
The names for the members of struct btrfs_fs_info related to the extent
map shrinker are a bit too long, so rename them to be shorter by replacing
the "extent_map_" prefix with the "em_" prefix.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11 14:34:17 +01:00
Filipe Manana 70a5f9e266 btrfs: simplify tracking progress for the extent map shrinker
Now that the extent map shrinker can only be run by a single task (as a
work queue item) there is no need to keep the progress of the shrinker
protected by a spinlock and passing the progress to trace events as
parameters. So remove the lock and simplify the arguments for the trace
events.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11 14:34:17 +01:00