Commit Graph

11694 Commits (f4e47affdb2edb99834dfd03f5ea8a899bbb98f0)

Author SHA1 Message Date
Linus Torvalds b3570b00dc Locking changes for v6.16:
Futexes:
 
    - Add support for task local hash maps (Sebastian Andrzej Siewior,
      Peter Zijlstra)
 
    - Implement the FUTEX2_NUMA ABI, which feature extends the futex
      interface to be NUMA-aware. On NUMA-aware futexes a second u32
      word containing the NUMA node is added to after the u32 futex value
      word. (Peter Zijlstra)
 
    - Implement the FUTEX2_MPOL ABI, which feature extends the futex
      interface to be mempolicy-aware as well, to further refine futex
      node mappings and lookups. (Peter Zijlstra)
 
   Locking primitives:
 
    - Misc cleanups (Andy Shevchenko, Borislav Petkov, Colin Ian King,
                     Ingo Molnar, Nam Cao, Peter Zijlstra)
 
   Lockdep:
 
    - Prevent abuse of lockdep subclasses (Waiman Long)
    - Add number of dynamic keys to /proc/lockdep_stats (Waiman Long)
 
 Plus misc cleanups and fixes.
 
 Note that the tree includes the following dependent out-of-subsystem
 changes as well:
 
  - rcuref: Provide rcuref_is_dead()
  - mm: Add vmalloc_huge_node()
  - mm: Add the mmap_read_lock guard to <linux/mmap_lock.h>
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy3E8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1isNw/9FS6+ZReiV3NLHvhwIw8+6U2vV733wLY+
 mFzDk2CRwv2d6xg+QUrhLNI93i2fZnwNvK1f6LcRZMa1pNmwCcEghKgm0G+fRgbv
 skiGrlkUCoEqsDUxRW++/aTBcMo0vqG3NOObnUOrddG2W9tfrR8jq/EwlzB99dO7
 q8qaBNl9W1vLT3gh9/RPP5uKt0NKIf8ObvsyhWCGaywg81h2lC4AHf0Xlj3ZD95T
 TO5jhUhl/muhYtaqxeYPK0gDtCrgFz8NwZdjKx1nyP7Gbko6+L50AvOVXog0SIAU
 nncftvutGJg2ki7dbSYPDoHQrHO0JsF1vUfVZRjaKFebWpFo2yYdNMbITOeXVhSC
 QSpbH2qvyn21nT/YSj9dottHWBoNYBEgrcSf6DO4g0d8A0Jh7egXjQdA852RpeQ0
 LWGYx4rfiKhnjiXlKKQHrURZkcxxa40o+ls3RfFl2/kWA+7aUybvw6nAeDEkV0oL
 s2U0vZxsY37EPWDm40rTe9r4YpPqcB65i9YIesPzhtbcHJVmN0gts0o5l+x53GhR
 CeftFiiUi2nm6JaT+1wGvBDT3hQ8+NZ8GkPSeA6pEJWE3i4KquZlcBZLOSLZ3k/B
 df58zQi99Yun33is5f1kqDNspqvJOg/1nxUK68PgNSdCMKeuZkJYrcmh/rKNnXSC
 f7M1XHoWFb0=
 =La/x
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "Futexes:

   - Add support for task local hash maps (Sebastian Andrzej Siewior,
     Peter Zijlstra)

   - Implement the FUTEX2_NUMA ABI, which feature extends the futex
     interface to be NUMA-aware. On NUMA-aware futexes a second u32 word
     containing the NUMA node is added to after the u32 futex value word
     (Peter Zijlstra)

   - Implement the FUTEX2_MPOL ABI, which feature extends the futex
     interface to be mempolicy-aware as well, to further refine futex
     node mappings and lookups (Peter Zijlstra)

  Locking primitives:

   - Misc cleanups (Andy Shevchenko, Borislav Petkov, Colin Ian King,
     Ingo Molnar, Nam Cao, Peter Zijlstra)

  Lockdep:

   - Prevent abuse of lockdep subclasses (Waiman Long)

   - Add number of dynamic keys to /proc/lockdep_stats (Waiman Long)

  Plus misc cleanups and fixes"

* tag 'locking-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  selftests/futex: Fix spelling mistake "unitiliazed" -> "uninitialized"
  futex: Correct the kernedoc return value for futex_wait_setup().
  tools headers: Synchronize prctl.h ABI header
  futex: Use RCU_INIT_POINTER() in futex_mm_init().
  selftests/futex: Use TAP output in futex_numa_mpol
  selftests/futex: Use TAP output in futex_priv_hash
  futex: Fix kernel-doc comments
  futex: Relax the rcu_assign_pointer() assignment of mm->futex_phash in futex_mm_init()
  futex: Fix outdated comment in struct restart_block
  locking/lockdep: Add number of dynamic keys to /proc/lockdep_stats
  locking/lockdep: Prevent abuse of lockdep subclass
  locking/lockdep: Move hlock_equal() to the respective #ifdeffery
  futex,selftests: Add another FUTEX2_NUMA selftest
  selftests/futex: Add futex_numa_mpol
  selftests/futex: Add futex_priv_hash
  selftests/futex: Build without headers nonsense
  tools/perf: Allow to select the number of hash buckets
  tools headers: Synchronize prctl.h ABI header
  futex: Implement FUTEX2_MPOL
  futex: Implement FUTEX2_NUMA
  ...
2025-05-26 14:42:07 -07:00
Linus Torvalds 14f19dc644 fscrypt update for 6.16
Add support for "hardware-wrapped inline encryption keys" to fscrypt.
 When enabled on supported platforms, this feature protects file contents
 keys from certain attacks, such as cold boot attacks.
 
 This feature uses the block layer support for wrapped keys which was
 merged in 6.15.  Wrapped key support has existed out-of-tree in Android
 for a long time, and it's finally ready for upstream now that there is a
 platform on which it works end-to-end with upstream.  Specifically,
 it works on the Qualcomm SM8650 HDK, using the Qualcomm ICE (Inline
 Crypto Engine) and HWKM (Hardware Key Manager).  The corresponding
 driver support is included in the SCSI tree for 6.16.  Validation for
 this feature includes two new tests that were already merged into
 xfstests (generic/368 and generic/369).
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaDNaqxQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK1K7AP92naB88sRzH1KG7Oic9+dMK+PImARP
 f15ebG2TzQ3qBgEAreqtNmtCNOH7pguYsTeAcX3Y243vzIkwkDRGk7k+aAI=
 =P6Sj
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt update from Eric Biggers:
 "Add support for 'hardware-wrapped inline encryption keys' to fscrypt.

  When enabled on supported platforms, this feature protects file
  contents keys from certain attacks, such as cold boot attacks.

  This feature uses the block layer support for wrapped keys which was
  merged in 6.15. Wrapped key support has existed out-of-tree in Android
  for a long time, and it's finally ready for upstream now that there is
  a platform on which it works end-to-end with upstream.

  Specifically, it works on the Qualcomm SM8650 HDK, using the Qualcomm
  ICE (Inline Crypto Engine) and HWKM (Hardware Key Manager). The
  corresponding driver support is included in the SCSI tree for 6.16.

  Validation for this feature includes two new tests that were already
  merged into xfstests (generic/368 and generic/369)"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  fscrypt: add support for hardware-wrapped keys
2025-05-26 13:27:40 -07:00
Paolo Bonzini 1f7c9d52b1 KVM/riscv changes for 6.16
- Add vector registers to get-reg-list selftest
 - VCPU reset related improvements
 - Remove scounteren initialization from VCPU reset
 - Support VCPU reset from userspace using set_mpstate() ioctl
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmgx8nkACgkQrUjsVaLH
 LAdmYw//aoxlXA8OKFMWFzEC4yZHfq502X1p95Z9h1i8IYKoRcqNxX8O5MJt0y8Z
 dyhgqKLT5q5bpUf/yxJjuJdZQJtCTC0o1bx89sSiM83M2J+AZymcIS5yFVIcxetv
 JRsaWh8GDc/XYUFBnjYCj5zmA5IObbd/QCVnkATjVRcRS56LnRv98P9YPxN7zhHF
 OhDRWELtUowk6FMgiHo75R9vYhOO1ywzzjLFnK5ZHLSYeQ7bhRX/jY5n1BShsWjp
 k+zVRTpJobw9AU76EPQaEKuqiynQz35ZPkxAiuWYac6SDKwmztvGJ1fCnSNAkJLk
 0kN/eAxv61F5nCRJxOkxK0Uy3U94zA6zX53VdnLoRN4rYpA8CYrE4mNAddN31IBC
 NHlBS59w2EsUxIRL1FiCUrEKKgeSWqJY1NuqsmB4ogeo4MKm8n1OhiYSU0l7NnQ6
 3h77ccnHN95cah2C9XDX3GeZ+on6z1t6FjZ4Enki1w3CCfGdKMsfphwUckTNzSvw
 hTeXhYHcP4VKsYkCcitdLR/VFwO4a3HlnjAtHdJLh0qfJ5SergifZ5/eQXhHu7f1
 1uyfk/6nKculr/8yzUVkOR7kerMjBuBx8jir89ceKD8qeA+4MUI1rdwEvzbzaEY9
 9aDwEVbQ1qMjpeONfKUvbyLJ7uN5Lae1/X4Kafmo2TWNAvtLPLo=
 =+S8x
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.16-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.16

- Add vector registers to get-reg-list selftest
- VCPU reset related improvements
- Remove scounteren initialization from VCPU reset
- Support VCPU reset from userspace using set_mpstate() ioctl
2025-05-26 16:27:00 -04:00
Paolo Bonzini 4d526b02df KVM/arm64 updates for 6.16
* New features:
 
   - Add large stage-2 mapping support for non-protected pKVM guests,
     clawing back some performance.
 
   - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and
     protected modes.
 
   - Enable nested virtualisation support on systems that support it
     (yes, it has been a long time coming), though it is disabled by
     default.
 
 * Improvements, fixes and cleanups:
 
   - Large rework of the way KVM tracks architecture features and links
     them with the effects of control bits. This ensures correctness of
     emulation (the data is automatically extracted from the published
     JSON files), and helps dealing with the evolution of the
     architecture.
 
   - Significant changes to the way pKVM tracks ownership of pages,
     avoiding page table walks by storing the state in the hypervisor's
     vmemmap. This in turn enables the THP support described above.
 
   - New selftest checking the pKVM ownership transition rules
 
   - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
     even if the host didn't have it.
 
   - Fixes for the address translation emulation, which happened to be
     rather buggy in some specific contexts.
 
   - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
     from the number of counters exposed to a guest and addressing a
     number of issues in the process.
 
   - Add a new selftest for the SVE host state being corrupted by a
     guest.
 
   - Keep HCR_EL2.xMO set at all times for systems running with the
     kernel at EL2, ensuring that the window for interrupts is slightly
     bigger, and avoiding a pretty bad erratum on the AmpereOne HW.
 
   - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
     from a pretty bad case of TLB corruption unless accesses to HCR_EL2
     are heavily synchronised.
 
   - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
     tables in a human-friendly fashion.
 
   - and the usual random cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmgwU7UACgkQI9DQutE9
 ekN93g//fNnejxf01dBFIbuylzYEyHZSEH0iTGLeM+ES9zvntCzciTYVzb27oqNG
 RDLShlQYp3w4rAe6ORzyePyHptOmKXCxfj/VXUFp3A7H9QYOxt1nacD3WxI9fCOo
 LzaSLquvgwFBaeTdDE0KdeTUKQHluId+w1Azh0lnHGeUP+lOHNZ8FqoP1/la0q04
 GvVL+l3wz/IhPP8r1YA0Q1bzJ5SLfSpjIw/0F5H/xgI4lyYdHzgFL8sKuSyFeCyM
 2STQi+ZnTCsAs4bkXkw2Pp9CFYrfQgZi+sf7Om+noAKhbJo3vb7/RHpgjv+QCjJy
 Kx4g9CbxHfaM03cH6uSLBoFzsACR1iAuUz8BCSRvvVNH4RVT6H+34nzjLZXLncrP
 gm1uYs9aMTLr91caeAx0aYIMWGYa1uqV0rum3WxyIHezN9Q/NuQoZyfprUufr8oX
 wCYE+ot4VT3DwG0UFZKKwj0BiCbYcbph9nBLVyZJsg8OKxpvspkCtPriFp1kb6BP
 dTTGSXd9JJqwSgP9qJLxijcv6Nfgp2gT42TWwh/dJRZXhnTCvr9IyclFIhoIIq3G
 Q2BkFCXOoEoNQhBA1tiWzJ9nDHf52P72Z2K1gPyyMZwF49HGa2BZBCJGkqX06wSs
 Riolf1/cjFhDno1ThiHKsHT0sG1D4oc9k/1NLq5dyNAEGcgATIA=
 =Jju3
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.16

* New features:

  - Add large stage-2 mapping support for non-protected pKVM guests,
    clawing back some performance.

  - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and
    protected modes.

  - Enable nested virtualisation support on systems that support it
    (yes, it has been a long time coming), though it is disabled by
    default.

* Improvements, fixes and cleanups:

  - Large rework of the way KVM tracks architecture features and links
    them with the effects of control bits. This ensures correctness of
    emulation (the data is automatically extracted from the published
    JSON files), and helps dealing with the evolution of the
    architecture.

  - Significant changes to the way pKVM tracks ownership of pages,
    avoiding page table walks by storing the state in the hypervisor's
    vmemmap. This in turn enables the THP support described above.

  - New selftest checking the pKVM ownership transition rules

  - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
    even if the host didn't have it.

  - Fixes for the address translation emulation, which happened to be
    rather buggy in some specific contexts.

  - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
    from the number of counters exposed to a guest and addressing a
    number of issues in the process.

  - Add a new selftest for the SVE host state being corrupted by a
    guest.

  - Keep HCR_EL2.xMO set at all times for systems running with the
    kernel at EL2, ensuring that the window for interrupts is slightly
    bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

  - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
    from a pretty bad case of TLB corruption unless accesses to HCR_EL2
    are heavily synchronised.

  - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
    tables in a human-friendly fashion.

  - and the usual random cleanups.
2025-05-26 16:19:46 -04:00
Paolo Bonzini 85502b2214 LoongArch KVM changes for v6.16
1. Don't flush tlb if HW PTW supported.
 2. Add LoongArch KVM selftests support.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmgsdO8WHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImejUWD/9zNaBSqTpqeAptRQ6qTKdrxYtN
 ZKJR9a8AQF5vMPD9dWoRr6iLaNt061GqBOKbhF5RGUVq6uIDfCkZbSbX1h6Ptgcz
 OwkJHbrZAu+Z31NSoYqbgPZnurSJ9oOUGsghqp3ecKEf0LptTVaKw4WDeKkKNOlq
 eJaUC1WJ1P4sSXhTlHKAl79Ds/1pza9iAgtKXothrh09PL48LYaWFpmiS0uQmKOD
 qwYVU/OJIUWn4ZOxyGdk6ZR0B+mJOwwoO1ILptRIeSk4oTKiE1HiIEqnhdnMrJFr
 DkRdwQ/jq7iH/CFkVybNzxgLqAqpGLJgPj5VakYPmp2scuWSESej9/o8wYrmn0y0
 QuDQah0LiRIcbUYRHLDkfKRKCndfJ5KCrpOiD5mZ8bMd2LF9hPnH5toCXb/ZEpsK
 plu9qUrQgXF8rkX/zCIvQrOp0kYdU8DMbhZsXymSVpEs1fwrCNp2/Z2PrMYLKGt+
 JT+65jVRJ67d1OdKw3DrWQA8Au0Ma1rgzX3oLDu8wnqAG7ULAJRVIkDMcxG7a5SQ
 P+DlIbEHC1U8Dw8hW+PFhfOl13M9p5s3EP5fy8q85UJ/5fJ41fCPBUOm/rfMQcci
 6/e+xwBKKkJ7oQ/fFKvJ+n6GbV6suV5FUxicYQjC43iiCKklRhG77uek/DK3c5um
 ZiTu+YapX9qbG54Q9w==
 =YM/u
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-kvm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.16

1. Don't flush tlb if HW PTW supported.
2. Add LoongArch KVM selftests support.
2025-05-26 16:12:13 -04:00
Linus Torvalds f83fcb87f8 xfs: New code for 6.16
Signed-off-by: Carlos Maiolino <cem@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQSmtYVZ/MfVMGUq1GNcsMJ8RxYuYwUCaDQXTQAKCRBcsMJ8RxYu
 YwUHAYDYYm9oit6AIr0AgTXBMJ+DHyqaszBy0VT2jQUP+yXxyrQc46QExXKU9YQV
 ffmGRAsBgN7ZdDI8D5qWySyOynB3b1Jn3/0jY82GscFK0k0oX3EtxbN9MdrovbgK
 qyO66BVx7w==
 =pG5y
 -----END PGP SIGNATURE-----

Merge tag 'xfs-merge-6.16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Carlos Maiolino:

 - Atomic writes for XFS

 - Remove experimental warnings for pNFS, scrub and parent pointers

* tag 'xfs-merge-6.16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits)
  xfs: add inode to zone caching for data placement
  xfs: free the item in xfs_mru_cache_insert on failure
  xfs: remove the EXPERIMENTAL warning for pNFS
  xfs: remove some EXPERIMENTAL warnings
  xfs: Remove deprecated xfs_bufd sysctl parameters
  xfs: stop using set_blocksize
  xfs: allow sysadmins to specify a maximum atomic write limit at mount time
  xfs: update atomic write limits
  xfs: add xfs_calc_atomic_write_unit_max()
  xfs: add xfs_file_dio_write_atomic()
  xfs: commit CoW-based atomic writes atomically
  xfs: add large atomic writes checks in xfs_direct_write_iomap_begin()
  xfs: add xfs_atomic_write_cow_iomap_begin()
  xfs: refine atomic write size check in xfs_file_write_iter()
  xfs: refactor xfs_reflink_end_cow_extent()
  xfs: allow block allocator to take an alignment hint
  xfs: ignore HW which cannot atomic write a single block
  xfs: add helpers to compute transaction reservation for finishing intent items
  xfs: add helpers to compute log item overhead
  xfs: separate out setting buftarg atomic writes limits
  ...
2025-05-26 12:56:01 -07:00
Linus Torvalds 49fffac983 for-6.16/io_uring-20250523
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgwnDgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgHZEADA1ym0ihHRjU2kTlXXOdkLLOl+o1RCHUjr
 KNf6sELGgyDC5FL/hAWdsjonInY4MLbJW0eNHEuuK8iFcn3wSHuHPXhRJXx/4cOs
 GGVLTd+Jm8ih4UL/GeLrBe3ehW9UUOtz1TCYzho0bdXHQWjruCFTqB5OzPQFMGQW
 R/lwXVNfjgGno5JhBnsrwz3ZnAfAnJhxqmc0GFHaa/nVF1OREYW/HS75EPFNiFgp
 Aevilw5QyrA2gDlZ+zCUwaGKAEl32yZCI6LZpI4kMtPK1reEbgFTrzIaCZ/OZCYM
 DVdBVEeuOmcBYIKbitD/+fcLNXHMrSJSWvUSXR4GuRNVkCTIAcEMKM2bX8VY7gmJ
 7ZQIo0EL2mSwmewHIYnvf9w/qrNYR0NyUt2v4U4rA2wj6e5w1EYMriP94wKdBGvD
 RNxja429N3fg3aBIkdQ6iYSVJRgE7DCo7dnKrEqglZPb32LOiNoOoou9shI5tb25
 8X7u0HzbpwKY/XByXZ2IaX7PYK2iFqkJjFYlGehtF97W85LGEvkDFU6fcBdjBO8r
 umgeE5O+lR+cf68JTJ6P34A7bBg71AXO3ytIuWunG56/0yu/FHDCjhBWE5ZjEhGR
 u2YhAGPRDQsJlSlxx8TXoKyYWP55NqdeyxYrmku/fZLn5WNVXOFeRlUDAZsF7mU7
 nuiOt9j4WA==
 =k8SF
 -----END PGP SIGNATURE-----

Merge tag 'for-6.16/io_uring-20250523' of git://git.kernel.dk/linux

Pull io_uring updates from Jens Axboe:

 - Avoid indirect function calls in io-wq for executing and freeing
   work.

   The design of io-wq is such that it can be a generic mechanism, but
   as it's just used by io_uring now, may as well avoid these indirect
   calls

 - Clean up registered buffers for networking

 - Add support for IORING_OP_PIPE. Pretty straight forward, allows
   creating pipes with io_uring, particularly useful for having these be
   instantiated as direct descriptors

 - Clean up the coalescing support fore registered buffers

 - Add support for multiple interface queues for zero-copy rx
   networking. As this feature was merged for 6.15 it supported just a
   single ifq per ring

 - Clean up the eventfd support

 - Add dma-buf support to zero-copy rx

 - Clean up and improving the request draining support

 - Clean up provided buffer support, most notably with an eye toward
   making the legacy support less intrusive

 - Minor fdinfo cleanups, dropping support for dumping what credentials
   are registered

 - Improve support for overflow CQE handling, getting rid of GFP_ATOMIC
   for allocating overflow entries where possible

 - Improve detection of cases where io-wq doesn't need to spawn a new
   worker unnecessarily

 - Various little cleanups

* tag 'for-6.16/io_uring-20250523' of git://git.kernel.dk/linux: (59 commits)
  io_uring/cmd: warn on reg buf imports by ineligible cmds
  io_uring/io-wq: only create a new worker if it can make progress
  io_uring/io-wq: ignore non-busy worker going to sleep
  io_uring/io-wq: move hash helpers to the top
  trace/io_uring: fix io_uring_local_work_run ctx documentation
  io_uring: finish IOU_OK -> IOU_COMPLETE transition
  io_uring: add new helpers for posting overflows
  io_uring: pass in struct io_big_cqe to io_alloc_ocqe()
  io_uring: make io_alloc_ocqe() take a struct io_cqe pointer
  io_uring: split alloc and add of overflow
  io_uring: open code io_req_cqe_overflow()
  io_uring/fdinfo: get rid of dumping credentials
  io_uring/fdinfo: only compile if CONFIG_PROC_FS is set
  io_uring/kbuf: unify legacy buf provision and removal
  io_uring/kbuf: refactor __io_remove_buffers
  io_uring/kbuf: don't compute size twice on prep
  io_uring/kbuf: drop extra vars in io_register_pbuf_ring
  io_uring/kbuf: use mem_is_zero()
  io_uring/kbuf: account ring io_buffer_list memory
  io_uring: drain based on allocates reqs
  ...
2025-05-26 12:13:22 -07:00
Linus Torvalds 6f59de9bc0 for-6.16/block-20250523
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgwnGYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpq9aD/4iqOts77xhWWLrOJWkkhOcV5rREeyppq8X
 MKYul9S4cc4Uin9Xou9a+nab31QBQEk3nsN3kX9o3yAXvkh6yUm36HD8qYNW/46q
 IUkwRQQJ0COyTnexMZQNTbZPQDIYcenXmQxOcrEJ5jC1Jcz0sOKHsgekL+ab3kCy
 fLnuz2ozvjGDMala/NmE8fN5qSlj4qQABHgbamwlwfo4aWu07cwfqn5G/FCYJgDO
 xUvsnTVclom2g4G+7eSSvGQI1QyAxl5QpviPnj/TEgfFBFnhbCSoBTEY6ecqhlfW
 6u59MF/Uw8E+weiuGY4L87kDtBhjQs3UMSLxCuwH7MxXb25ff7qB4AIkcFD0kKFH
 3V5NtwqlU7aQT0xOjGxaHhfPwjLD+FVss4ARmuHS09/Kn8egOW9yROPyetnuH84R
 Oz0Ctnt1IPLFjvGeg3+rt9fjjS9jWOXLITb9Q6nX9gnCt7orCwIYke8YCpmnJyhn
 i+fV4CWYIQBBRKxIT0E/GhJxZOmL0JKpomnbpP2dH8npemnsTCuvtfdrK9gfhH2X
 chBVqCPY8MNU5zKfzdEiavPqcm9392lMzOoOXW2pSC1eAKqnAQ86ZT3r7rLntqE8
 75LxHcvaQIsnpyG+YuJVHvoiJ83TbqZNpyHwNaQTYhDmdYpp2d/wTtTQywX4DuXb
 Y6NDJw5+kQ==
 =1PNK
 -----END PGP SIGNATURE-----

Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - ublk updates:
      - Add support for updating the size of a ublk instance
      - Zero-copy improvements
      - Auto-registering of buffers for zero-copy
      - Series simplifying and improving GET_DATA and request lookup
      - Series adding quiesce support
      - Lots of selftests additions
      - Various cleanups

 - NVMe updates via Christoph:
      - add per-node DMA pools and use them for PRP/SGL allocations
        (Caleb Sander Mateos, Keith Busch)
      - nvme-fcloop refcounting fixes (Daniel Wagner)
      - support delayed removal of the multipath node and optionally
        support the multipath node for private namespaces (Nilay Shroff)
      - support shared CQs in the PCI endpoint target code (Wilfred
        Mallawa)
      - support admin-queue only authentication (Hannes Reinecke)
      - use the crc32c library instead of the crypto API (Eric Biggers)
      - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes
        Reinecke, Leon Romanovsky, Gustavo A. R. Silva)

 - MD updates via Yu:
      - Fix that normal IO can be starved by sync IO, found by mkfs on
        newly created large raid5, with some clean up patches for bdev
        inflight counters

 - Clean up brd, getting rid of atomic kmaps and bvec poking

 - Add loop driver specifically for zoned IO testing

 - Eliminate blk-rq-qos calls with a static key, if not enabled

 - Improve hctx locking for when a plug has IO for multiple queues
   pending

 - Remove block layer bouncing support, which in turn means we can
   remove the per-node bounce stat as well

 - Improve blk-throttle support

 - Improve delay support for blk-throttle

 - Improve brd discard support

 - Unify IO scheduler switching. This should also fix a bunch of lockdep
   warnings we've been seeing, after enabling lockdep support for queue
   freezing/unfreezeing

 - Add support for block write streams via FDP (flexible data placement)
   on NVMe

 - Add a bunch of block helpers, facilitating the removal of a bunch of
   duplicated boilerplate code

 - Remove obsolete BLK_MQ pci and virtio Kconfig options

 - Add atomic/untorn write support to blktrace

 - Various little cleanups and fixes

* tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits)
  selftests: ublk: add test for UBLK_F_QUIESCE
  ublk: add feature UBLK_F_QUIESCE
  selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
  traceevent/block: Add REQ_ATOMIC flag to block trace events
  ublk: run auto buf unregisgering in same io_ring_ctx with registering
  io_uring: add helper io_uring_cmd_ctx_handle()
  ublk: remove io argument from ublk_auto_buf_reg_fallback()
  ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch()
  selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK
  selftests: ublk: support UBLK_F_AUTO_BUF_REG
  ublk: support UBLK_AUTO_BUF_REG_FALLBACK
  ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG
  ublk: prepare for supporting to register request buffer automatically
  ublk: convert to refcount_t
  selftests: ublk: make IO & device removal test more stressful
  nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk
  nvme: introduce multipath_always_on module param
  nvme-multipath: introduce delayed removal of the multipath head node
  nvme-pci: derive and better document max segments limits
  nvme-pci: use struct_size for allocation struct nvme_dev
  ...
2025-05-26 11:39:36 -07:00
Linus Torvalds c5bfc48d54 vfs-6.16-rc1.coredump
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTwAKCRCRxhvAZXjc
 oliqAQCVdrBn7D2+dB04hjefFq6W6LhyLGrtCCliflicN5SyxAD+PHHiB9nFKe6J
 xQkaNArCJjPd2QEx73aGjHzi3UQq6Qs=
 =Pk9c
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull coredump updates from Christian Brauner:
 "This adds support for sending coredumps over an AF_UNIX socket. It
  also makes (implicit) use of the new SO_PEERPIDFD ability to hand out
  pidfds for reaped peer tasks

  The new coredump socket will allow userspace to not have to rely on
  usermode helpers for processing coredumps and provides a saf way to
  handle them instead of relying on super privileged coredumping helpers

  This will also be significantly more lightweight since the kernel
  doens't have to do a fork()+exec() for each crashing process to spawn
  a usermodehelper. Instead the kernel just connects to the AF_UNIX
  socket and userspace can process it concurrently however it sees fit.
  Support for userspace is incoming starting with systemd-coredump

  There's more work coming in that direction next cycle. The rest below
  goes into some details and background

  Coredumping currently supports two modes:

   (1) Dumping directly into a file somewhere on the filesystem.

   (2) Dumping into a pipe connected to a usermode helper process
       spawned as a child of the system_unbound_wq or kthreadd

  For simplicity I'm mostly ignoring (1). There's probably still some
  users of (1) out there but processing coredumps in this way can be
  considered adventurous especially in the face of set*id binaries

  The most common option should be (2) by now. It works by allowing
  userspace to put a string into /proc/sys/kernel/core_pattern like:

          |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

  The "|" at the beginning indicates to the kernel that a pipe must be
  used. The path following the pipe indicator is a path to a binary that
  will be spawned as a usermode helper process. Any additional
  parameters pass information about the task that is generating the
  coredump to the binary that processes the coredump

  In the example the core_pattern shown causes the kernel to spawn
  systemd-coredump as a usermode helper. There's various conceptual
  consequences of this (non-exhaustive list):

   - systemd-coredump is spawned with file descriptor number 0 (stdin)
     connected to the read-end of the pipe. All other file descriptors
     are closed. That specifically includes 1 (stdout) and 2 (stderr).

     This has already caused bugs because userspace assumed that this
     cannot happen (Whether or not this is a sane assumption is
     irrelevant)

   - systemd-coredump will be spawned as a child of system_unbound_wq.
     So it is not a child of any userspace process and specifically not
     a child of PID 1. It cannot be waited upon and is in a weird hybrid
     upcall which are difficult for userspace to control correctly

   - systemd-coredump is spawned with full kernel privileges. This
     necessitates all kinds of weird privilege dropping excercises in
     userspace to make this safe

   - A new usermode helper has to be spawned for each crashing process

  This adds a new mode:

   (3) Dumping into an AF_UNIX socket

  Userspace can set /proc/sys/kernel/core_pattern to:

          @/path/to/coredump.socket

  The "@" at the beginning indicates to the kernel that an AF_UNIX
  coredump socket will be used to process coredumps

  The coredump socket must be located in the initial mount namespace.
  When a task coredumps it opens a client socket in the initial network
  namespace and connects to the coredump socket:

   - The coredump server uses SO_PEERPIDFD to get a stable handle on the
     connected crashing task. The retrieved pidfd will provide a stable
     reference even if the crashing task gets SIGKILLed while generating
     the coredump. That is a huge attack vector right now

   - By setting core_pipe_limit non-zero userspace can guarantee that
     the crashing task cannot be reaped behind it's back and thus
     process all necessary information in /proc/<pid>. The SO_PEERPIDFD
     can be used to detect whether /proc/<pid> still refers to the same
     process

     The core_pipe_limit isn't used to rate-limit connections to the
     socket. This can simply be done via AF_UNIX socket directly

   - The pidfd for the crashing task will contain information how the
     task coredumps. The PIDFD_GET_INFO ioctl gained a new flag
     PIDFD_INFO_COREDUMP which can be used to retreive the coredump
     information

     If the coredump gets a new coredump client connection the kernel
     guarantees that PIDFD_INFO_COREDUMP information is available.

     Currently the following information is provided in the new
     @coredump_mask extension to struct pidfd_info:

      * PIDFD_COREDUMPED is raised if the task did actually coredump

      * PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping
        (e.g., undumpable)

      * PIDFD_COREDUMP_USER is raised if this is a regular coredump and
        doesn't need special care by the coredump server

      * PIDFD_COREDUMP_ROOT is raised if the generated coredump should
        be treated as sensitive and the coredump server should restrict
        access to the generated coredump to sufficiently privileged
        users"

* tag 'vfs-6.16-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  mips, net: ensure that SOCK_COREDUMP is defined
  selftests/coredump: add tests for AF_UNIX coredumps
  selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructure
  coredump: validate socket name as it is written
  coredump: show supported coredump modes
  pidfs, coredump: add PIDFD_INFO_COREDUMP
  coredump: add coredump socket
  coredump: reflow dump helpers a little
  coredump: massage do_coredump()
  coredump: massage format_corename()
2025-05-26 11:17:01 -07:00
Linus Torvalds 7d7a103d29 vfs-6.16-rc1.pidfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTwAKCRCRxhvAZXjc
 ov4zAP4yfqKBAz6eMt9CzDgHCdVQJ9Nuur1EiRdot3maPzHTcQEA2hVkJrvVo1Y/
 jCVAf7gmGX1Uu6nCUF6Vjy35g8i20gA=
 =nzYS
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pidfs updates from Christian Brauner:
 "Features:

   - Allow handing out pidfds for reaped tasks for AF_UNIX SO_PEERPIDFD
     socket option

     SO_PEERPIDFD is a socket option that allows to retrieve a pidfd for
     the process that called connect() or listen(). This is heavily used
     to safely authenticate clients in userspace avoiding security bugs
     due to pid recycling races (dbus, polkit, systemd, etc.)

     SO_PEERPIDFD currently doesn't support handing out pidfds if the
     sk->sk_peer_pid thread-group leader has already been reaped. In
     this case it currently returns EINVAL. Userspace still wants to get
     a pidfd for a reaped process to have a stable handle it can pass
     on. This is especially useful now that it is possible to retrieve
     exit information through a pidfd via the PIDFD_GET_INFO ioctl()'s
     PIDFD_INFO_EXIT flag

     Another summary has been provided by David Rheinsberg:

      > A pidfd can outlive the task it refers to, and thus user-space
      > must already be prepared that the task underlying a pidfd is
      > gone at the time they get their hands on the pidfd. For
      > instance, resolving the pidfd to a PID via the fdinfo must be
      > prepared to read `-1`.
      >
      > Despite user-space knowing that a pidfd might be stale, several
      > kernel APIs currently add another layer that checks for this. In
      > particular, SO_PEERPIDFD returns `EINVAL` if the peer-task was
      > already reaped, but returns a stale pidfd if the task is reaped
      > immediately after the respective alive-check.
      >
      > This has the unfortunate effect that user-space now has two ways
      > to check for the exact same scenario: A syscall might return
      > EINVAL/ESRCH/... *or* the pidfd might be stale, even though
      > there is no particular reason to distinguish both cases. This
      > also propagates through user-space APIs, which pass on pidfds.
      > They must be prepared to pass on `-1` *or* the pidfd, because
      > there is no guaranteed way to get a stale pidfd from the kernel.
      >
      > Userspace must already deal with a pidfd referring to a reaped
      > task as the task may exit and get reaped at any time will there
      > are still many pidfds referring to it

     In order to allow handing out reaped pidfd SO_PEERPIDFD needs to
     ensure that PIDFD_INFO_EXIT information is available whenever a
     pidfd for a reaped task is created by PIDFD_INFO_EXIT. The uapi
     promises that reaped pidfds are only handed out if it is guaranteed
     that the caller sees the exit information:

     TEST_F(pidfd_info, success_reaped)
     {
             struct pidfd_info info = {
                     .mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT,
             };

             /*
              * Process has already been reaped and PIDFD_INFO_EXIT been set.
              * Verify that we can retrieve the exit status of the process.
              */
             ASSERT_EQ(ioctl(self->child_pidfd4, PIDFD_GET_INFO, &info), 0);
             ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS));
             ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT));
             ASSERT_TRUE(WIFEXITED(info.exit_code));
             ASSERT_EQ(WEXITSTATUS(info.exit_code), 0);
     }

     To hand out pidfds for reaped processes we thus allocate a pidfs
     entry for the relevant sk->sk_peer_pid at the time the
     sk->sk_peer_pid is stashed and drop it when the socket is
     destroyed. This guarantees that exit information will always be
     recorded for the sk->sk_peer_pid task and we can hand out pidfds
     for reaped processes

   - Hand a pidfd to the coredump usermode helper process

     Give userspace a way to instruct the kernel to install a pidfd for
     the crashing process into the process started as a usermode helper.
     There's still tricky race-windows that cannot be easily or
     sometimes not closed at all by userspace. There's various ways like
     looking at the start time of a process to make sure that the
     usermode helper process is started after the crashing process but
     it's all very very brittle and fraught with peril

     The crashed-but-not-reaped process can be killed by userspace
     before coredump processing programs like systemd-coredump have had
     time to manually open a PIDFD from the PID the kernel provides
     them, which means they can be tricked into reading from an
     arbitrary process, and they run with full privileges as they are
     usermode helper processes

     Even if that specific race-window wouldn't exist it's still the
     safest and cleanest way to let the kernel provide the pidfd
     directly instead of requiring userspace to do it manually. In
     parallel with this commit we already have systemd adding support
     for this in [1]

     When the usermode helper process is forked we install a pidfd file
     descriptor three into the usermode helper's file descriptor table
     so it's available to the exec'd program

     Since usermode helpers are either children of the system_unbound_wq
     workqueue or kthreadd we know that the file descriptor table is
     empty and can thus always use three as the file descriptor number

     Note, that we'll install a pidfd for the thread-group leader even
     if a subthread is calling do_coredump(). We know that task linkage
     hasn't been removed yet and even if this @current isn't the actual
     thread-group leader we know that the thread-group leader cannot be
     reaped until
     @current has exited

   - Allow telling when a task has not been found from finding the wrong
     task when creating a pidfd

     We currently report EINVAL whenever a struct pid has no tasked
     attached anymore thereby conflating two concepts:

      (1) The task has already been reaped

      (2) The caller requested a pidfd for a thread-group leader but the
          pid actually references a struct pid that isn't used as a
          thread-group leader

     This is causing issues for non-threaded workloads as in where they
     expect ESRCH to be reported, not EINVAL

     So allow userspace to reliably distinguish between (1) and (2)

   - Make it possible to detect when a pidfs entry would outlive the
     struct pid it pinned

   - Add a range of new selftests

  Cleanups:

   - Remove unneeded NULL check from pidfd_prepare() for passed struct
     pid

   - Avoid pointless reference count bump during release_task()

  Fixes:

   - Various fixes to the pidfd and coredump selftests

   - Fix error handling for replace_fd() when spawning coredump usermode
     helper"

* tag 'vfs-6.16-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  pidfs: detect refcount bugs
  coredump: hand a pidfd to the usermode coredump helper
  coredump: fix error handling for replace_fd()
  pidfs: move O_RDWR into pidfs_alloc_file()
  selftests: coredump: Raise timeout to 2 minutes
  selftests: coredump: Fix test failure for slow machines
  selftests: coredump: Properly initialize pointer
  net, pidfs: enable handing out pidfds for reaped sk->sk_peer_pid
  pidfs: get rid of __pidfd_prepare()
  net, pidfs: prepare for handing out pidfds for reaped sk->sk_peer_pid
  pidfs: register pid in pidfs
  net, pidfd: report EINVAL for ESRCH
  release_task: kill the no longer needed get/put_pid(thread_pid)
  pidfs: ensure consistent ENOENT/ESRCH reporting
  exit: move wake_up_all() pidfd waiters into __unhash_process()
  selftest/pidfd: add test for thread-group leader pidfd open for thread
  pidfd: improve uapi when task isn't found
  pidfd: remove unneeded NULL check from pidfd_prepare()
  selftests/pidfd: adapt to recent changes
2025-05-26 10:30:02 -07:00
Paolo Abeni f5b60d6a57 netfilter pull request 25-05-23
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmgwd00ACgkQ1w0aZmrP
 KyEfwA//RXQ3i8PCa7lKHxDRhVzG3rEvgXRmiXeNd+JjzsCnybBb7+wRf3dtBGWT
 +1s44Utx1JqosWxCVBulqYC5bqSC66789l5X2jhYJmUZxRrbcsqPngwnIrjb/XeK
 ZJM62wiRhkBQED7yZLGy+y4VHQiG8CEMt16AOQHk863aruWv1tT7up90CTtzA545
 4GF/grU3FC0PsoTLwzWyvqsWK+9uk3Y4Tifp5hU3w6uRD9EjX5tHCZlXXSqOF5gu
 KT26OYsePYXhJVZIwDf2oVLGi0EVTPB9IFxZSNgLqyXqu2ILAb9OwRNVTNfTP7Pg
 1RWJWmgqvRNs9OM2ecifYgQf/AfvCL0Cja1BJOjmvtICuGegrYH7G5YYQsMl9CoE
 7jBoTzpToSASat5+dwoz81Bvzh447dYxRE2VmbxmRTTWToQYS1KGBPc9e3u/n5Rr
 ruh8tRZ3/R0Fy+YLDkrJst3grh5RLITbuyu4ElJMArPU50mLTVYxKd6nA3BqwB5G
 1GmLfCzvQH3e6PKz6CNke1AytVDy/wLTXtcbLnze2Muaj4AqhtOe5Q8ypnOO0Vyk
 PsJ6U3rm2asd3GE9+AIx8gZBv8yCu1w9CiwLK8ybT2NETb2dEnqPgWeDyT7rpcaD
 sQOPsBE1q/TEp9gofbYCHBm5E2mX9UP7Q6EHCTekrI97xLq8Q2M=
 =fBhd
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-25-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next,
specifically 26 patches: 5 patches adding/updating selftests,
4 fixes, 3 PREEMPT_RT fixes, and 14 patches to enhance nf_tables):

1) Improve selftest coverage for pipapo 4 bit group format, from
   Florian Westphal.

2) Fix incorrect dependencies when compiling a kernel without
   legacy ip{6}tables support, also from Florian.

3) Two patches to fix nft_fib vrf issues, including selftest updates
   to improve coverage, also from Florian Westphal.

4) Fix incorrect nesting in nft_tunnel's GENEVE support, from
   Fernando F. Mancera.

5) Three patches to fix PREEMPT_RT issues with nf_dup infrastructure
   and nft_inner to match in inner headers, from Sebastian Andrzej Siewior.

6) Integrate conntrack information into nft trace infrastructure,
   from Florian Westphal.

7) A series of 13 patches to allow to specify wildcard netdevice in
   netdev basechain and flowtables, eg.

   table netdev filter {
       chain ingress {
           type filter hook ingress devices = { eth0, eth1, vlan* } priority 0; policy accept;
       }
   }

   This also allows for runtime hook registration on NETDEV_{UN}REGISTER
   event, from Phil Sutter.

netfilter pull request 25-05-23

* tag 'nf-next-25-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: (26 commits)
  selftests: netfilter: Torture nftables netdev hooks
  netfilter: nf_tables: Add notifications for hook changes
  netfilter: nf_tables: Support wildcard netdev hook specs
  netfilter: nf_tables: Sort labels in nft_netdev_hook_alloc()
  netfilter: nf_tables: Handle NETDEV_CHANGENAME events
  netfilter: nf_tables: Wrap netdev notifiers
  netfilter: nf_tables: Respect NETDEV_REGISTER events
  netfilter: nf_tables: Prepare for handling NETDEV_REGISTER events
  netfilter: nf_tables: Have a list of nf_hook_ops in nft_hook
  netfilter: nf_tables: Pass nf_hook_ops to nft_unregister_flowtable_hook()
  netfilter: nf_tables: Introduce nft_register_flowtable_ops()
  netfilter: nf_tables: Introduce nft_hook_find_ops{,_rcu}()
  netfilter: nf_tables: Introduce functions freeing nft_hook objects
  netfilter: nf_tables: add packets conntrack state to debug trace info
  netfilter: conntrack: make nf_conntrack_id callable without a module dependency
  netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmit
  netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctx
  netfilter: nf_dup{4, 6}: Move duplication check to task_struct
  netfilter: nft_tunnel: fix geneve_opt dump
  selftests: netfilter: nft_fib.sh: add type and oif tests with and without VRFs
  ...
====================

Link: https://patch.msgid.link/20250523132712.458507-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26 18:53:41 +02:00
Ingo Molnar 94ec70880f Merge branch 'locking/futex' into locking/core, to pick up pending futex changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-25 10:10:08 +02:00
Ming Lei b465ae7b25 ublk: add feature UBLK_F_QUIESCE
Add feature UBLK_F_QUIESCE, which adds control command `UBLK_U_CMD_QUIESCE_DEV`
for quiescing device, then device state can become `UBLK_S_DEV_QUIESCED`
or `UBLK_S_DEV_FAIL_IO` finally from ublk_ch_release() with ublk server
cooperation.

This feature can help to support to upgrade ublk server application by
shutting down ublk server gracefully, meantime keep ublk block device
persistent during the upgrading period.

The feature is only available for UBLK_F_USER_RECOVERY.

Suggested-by: Yoav Cohen <yoav@nvidia.com>
Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250522163523.406289-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23 09:42:12 -06:00
Phil Sutter 465b9ee0ee netfilter: nf_tables: Add notifications for hook changes
Notify user space if netdev hooks are updated due to netdev add/remove
events. Send minimal notification messages by introducing
NFT_MSG_NEWDEV/DELDEV message types describing a single device only.

Upon NETDEV_CHANGENAME, the callback has no information about the
interface's old name. To provide a clear message to user space, include
the hook's stored interface name in the notification.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23 13:57:14 +02:00
Florian Westphal 7e5c6aa67e netfilter: nf_tables: add packets conntrack state to debug trace info
Add the minimal relevant info needed for userspace ("nftables monitor
trace") to provide the conntrack view of the packet:

- state (new, related, established)
- direction (original, reply)
- status (e.g., if connection is subject to dnat)
- id (allows to query ctnetlink for remaining conntrack state info)

Example:
trace id a62 inet filter PRE_RAW packet: iif "enp0s3" ether [..]
  [..]
trace id a62 inet filter PRE_MANGLE conntrack: ct direction original ct state new ct id 32
trace id a62 inet filter PRE_MANGLE packet: [..]
 [..]
trace id a62 inet filter IN conntrack: ct direction original ct state new ct status dnat-done ct id 32
 [..]

In this case one can see that while NAT is active, the new connection
isn't subject to a translation.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23 13:57:12 +02:00
Marc Zyngier 7f3225fe8b Merge branch kvm-arm64/nv-nv into kvmarm-master/next
* kvm-arm64/nv-nv:
  : .
  : Flick the switch on the NV support by adding the missing piece
  : in the form of the VNCR page management. From the cover letter:
  :
  : "This is probably the most interesting bit of the whole NV adventure.
  : So far, everything else has been a walk in the park, but this one is
  : where the real fun takes place.
  :
  : With FEAT_NV2, most of the NV support revolves around tricking a guest
  : into accessing memory while it tries to access system registers. The
  : hypervisor's job is to handle the context switch of the actual
  : registers with the state in memory as needed."
  : .
  KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
  KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
  KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
  KVM: arm64: Document NV caps and vcpu flags
  KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
  KVM: arm64: nv: Remove dead code from ERET handling
  KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch
  KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
  KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address
  KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
  KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
  KVM: arm64: nv: Handle VNCR_EL2-triggered faults
  KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
  KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
  KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting
  KVM: arm64: nv: Move TLBI range decoding to a helper
  KVM: arm64: nv: Snapshot S1 ASID tagging information during walk
  KVM: arm64: nv: Extract translation helper from the AT code
  KVM: arm64: nv: Allocate VNCR page when required
  arm64: sysreg: Add layout for VNCR_EL2

Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23 10:58:57 +01:00
Ming Lei 914e0dc508 ublk: run auto buf unregisgering in same io_ring_ctx with registering
UBLK_F_AUTO_BUF_REG requires that the buffer registered automatically
is unregistered in same `io_ring_ctx`, so check it explicitly.

Document this requirement for UBLK_F_AUTO_BUF_REG.

Drop WARN_ON_ONCE() which is triggered from userspace code path.

Fixes: 99c1e4eb6a ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG")
Reported-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250522152043.399824-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-22 10:03:55 -06:00
Kory Maincent 4ff4d86f6c net: Add support for providing the PTP hardware source in tsinfo
Multi-PTP source support within a network topology has been merged,
but the hardware timestamp source is not yet exposed to users.
Currently, users only see the PTP index, which does not indicate
whether the timestamp comes from a PHY or a MAC.

Add support for reporting the hwtstamp source using a
hwtstamp-source field, alongside hwtstamp-phyindex, to describe
the origin of the hardware timestamp.

Remove HWTSTAMP_SOURCE_UNSPEC enum value as it is not used at all.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250519-feature_ptp_source-v4-1-5d10e19a0265@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-22 15:32:00 +02:00
Ingo Molnar 44889ff67c perf/uapi: Clean up <uapi/linux/perf_event.h> a bit
When applying a recent commit to the <uapi/linux/perf_event.h>
header I noticed that we have accumulated quite a bit of
historic noise in this header, so do a bit of spring cleaning:

 - Define bitfields in a vertically aligned fashion, like
   perf_event_mmap_page::capabilities already does. This
   makes it easier to see the distribution and sizing of
   bits within a word, at a glance. The following is much
   more readable:

			__u64	cap_bit0		: 1,
				cap_bit0_is_deprecated	: 1,
				cap_user_rdpmc		: 1,
				cap_user_time		: 1,
				cap_user_time_zero	: 1,
				cap_user_time_short	: 1,
				cap_____res		: 58;

   Than:

			__u64	cap_bit0:1,
				cap_bit0_is_deprecated:1,
				cap_user_rdpmc:1,
				cap_user_time:1,
				cap_user_time_zero:1,
				cap_user_time_short:1,
				cap_____res:58;

   So convert all bitfield definitions from the latter style to the
   former style.

 - Fix typos and grammar

 - Fix capitalization

 - Remove whitespace noise

 - Harmonize the definitions of various generations and groups of
   PERF_MEM_ ABI values.

 - Vertically align all definitions and assignments to the same
   column (48), as the first definition (enum perf_type_id),
   throughout the entire header.

 - And in general make the code and comments to be more in sync
   with each other and to be more readable overall.

No change in functionality.

Copy the changes over to tools/include/uapi/linux/perf_event.h.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250521221529.2547099-1-irogers@google.com
2025-05-22 11:03:41 +02:00
Ian Rogers f4b18ff2c1 perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h>
AAUX data for PERF_SAMPLE_AUX appears last. PERF_SAMPLE_CGROUP is
missing from the comment.

This makes the <uapi/linux/perf_event.h> comment match that in the
perf_event_open man page.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20250521221529.2547099-1-irogers@google.com
2025-05-22 10:01:34 +02:00
Jakub Kicinski 8b8762eeec tools: ynl-gen: add makefile deps for neigh
Kory is reporting build issues after recent additions to YNL
if the system headers are old.

Link: https://lore.kernel.org/20250519164949.597d6e92@kmaincent-XPS-13-7390
Reported-by: Kory Maincent <kory.maincent@bootlin.com>
Fixes: 0939a418b3 ("tools: ynl: submsg: reverse parse / error reporting")
Tested-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250520161916.413298-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21 12:38:20 -07:00
Christian Brauner 1d8db6fd69
pidfs, coredump: add PIDFD_INFO_COREDUMP
Extend the PIDFD_INFO_COREDUMP ioctl() with the new PIDFD_INFO_COREDUMP
mask flag. This adds the @coredump_mask field to struct pidfd_info.

When a task coredumps the kernel will provide the following information
to userspace in @coredump_mask:

* PIDFD_COREDUMPED is raised if the task did actually coredump.
* PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping (e.g.,
  undumpable).
* PIDFD_COREDUMP_USER is raised if this is a regular coredump and
  doesn't need special care by the coredump server.
* PIDFD_COREDUMP_ROOT is raised if the generated coredump should be
  treated as sensitive and the coredump server should restrict to the
  generated coredump to sufficiently privileged users.

The kernel guarantees that by the time the connection is made the all
PIDFD_INFO_COREDUMP info is available.

Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-5-664f3caf2516@kernel.org
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21 13:59:12 +02:00
Nicolas Pitre 81cf4d7d23 vt: add VT_GETCONSIZECSRPOS to retrieve console size and cursor position
The console dimension and cursor position are available through the
/dev/vcsa interface already. However the /dev/vcsa header format uses
single-byte fields therefore those values are clamped to 255.

As surprizing as this may seem, some people do use 240-column 67-row
screens (a 1920x1080 monitor with 8x16 pixel fonts) which is getting
close to the limit. Monitors with higher resolution are not uncommon
these days (3840x2160 producing a 480x135 character display) and it is
just a matter of time before someone with, say, a braille display using
the Linux VT console and BRLTTY on such a screen reports a bug about
missing and oddly misaligned screen content.

Let's add VT_GETCONSIZECSRPOS for the retrieval of console size and cursor
position without byte-sized limitations. The actual console size limit as
encoded in vt.c is 32767x32767 so using a short here is appropriate. Then
this can be used to get the cursor position when /dev/vcsa reports 255.

The screen dimension may already be obtained using TIOCGWINSZ and adding
the same information to VT_GETCONSIZECSRPOS might be redundant. However
applications that care about cursor position also care about display
size and having 2 separate system calls to obtain them separately is
wasteful. Also, the cursor position can be queried by writing "\e[6n" to
a tty and reading back the result but that may be done only by the actual
application using that tty and not a sideline observer.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Link: https://lore.kernel.org/r/20250520171851.1219676-3-nico@fluxnic.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21 13:41:03 +02:00
Nicolas Pitre 80fa7a0337 vt: bracketed paste support
This is comprised of 3 aspects:

- Take note of when applications advertise bracketed paste support via
  "\e[?2004h" and "\e[?2004l".

- Insert bracketed paste markers ("\e[200~" and "\e[201~") around pasted
  content in paste_selection() when bracketed paste is active.

- Add TIOCL_GETBRACKETEDPASTE to return bracketed paste status so user
  space daemons implementing cut-and-paste functionality (e.g. gpm,
  BRLTTY) may know when to insert bracketed paste markers.

Link: https://en.wikipedia.org/wiki/Bracketed-paste

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20250520171851.1219676-2-nico@fluxnic.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21 13:41:03 +02:00
Wang Yaxin 0bf2d838de taskstats: fix struct taskstats breaks backward compatibility since version 15
Problem
========
commit 658eb5ab91 ("delayacct: add delay max to record delay peak")
  - adding more fields
commit f65c64f311 ("delayacct: add delay min to record delay peak")
  - adding more fields
commit b016d08737 ("taskstats: modify taskstats version")
 - version bump to 15

Since version 15 (TASKSTATS_VERSION=15) the new layout of the structure
adds fields in the middle of the structure, rendering all old software
incompatible with newer kernels and software compiled against the new
kernel headers incompatible with older kernels.

Solution
=========
move delay max and delay min to the end of taskstat, and bump
the version to 16 after the change

[wang.yaxin@zte.com.cn: adjust indentation]
  Link: https://lkml.kernel.org/r/202505192131489882NSciXV4EGd8zzjLuwoOK@zte.com.cn
Link: https://lkml.kernel.org/r/20250510155413259V4JNRXxukdDgzsaL0Fo6a@zte.com.cn
Fixes: f65c64f311 ("delayacct: add delay min to record delay peak")
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-20 22:49:39 -07:00
Radim Krčmář 5b9db9c16f RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
Add a toggleable VM capability to reset the VCPU from userspace by
setting MP_STATE_INIT_RECEIVED through IOCTL.

Reset through a mp_state to avoid adding a new IOCTL.
Do not reset on a transition from STOPPED to RUNNABLE, because it's
better to avoid side effects that would complicate userspace adoption.
The MP_STATE_INIT_RECEIVED is not a permanent mp_state -- IOCTL resets
the VCPU while preserving the original mp_state -- because we wouldn't
gain much from having a new state it in the rest of KVM, but it's a very
non-standard use of the IOCTL.

Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250515143723.2450630-5-rkrcmar@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21 09:34:57 +05:30
Ming Lei 53f427e794 ublk: support UBLK_AUTO_BUF_REG_FALLBACK
For UBLK_F_AUTO_BUF_REG, buffer is registered to uring_cmd context
automatically with the provided buffer index. User may provide one wrong
buffer index, or the specified buffer is registered by application already.

Add UBLK_AUTO_BUF_REG_FALLBACK for supporting to auto buffer registering
fallback by completing the uring_cmd and telling ublk server the
register failure via UBLK_AUTO_BUF_REG_FALLBACK, then ublk server still
can register the buffer from userspace.

So we can provide reliable way for supporting auto buffer register.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250520045455.515691-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20 10:24:45 -06:00
Ming Lei 99c1e4eb6a ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG
Add UBLK_F_AUTO_BUF_REG for supporting to register buffer automatically
to local io_uring context with provided buffer index.

Add UAPI structure `struct ublk_auto_buf_reg` for holding user parameter
to register request buffer automatically, one 'flags' field is defined, and
there is still 32bit available for future extension, such as, adding one
io_ring FD field for registering buffer to external io_uring.

`struct ublk_auto_buf_reg` is populated from ublk uring_cmd's sqe->addr,
and all existing ublk commands are data-less, so it is just fine to reuse
sqe->addr for this purpose.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250520045455.515691-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20 10:24:45 -06:00
Marc Zyngier a7484c80e5 KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
Since we're (almost) feature complete, let's allow userspace to
request KVM_ARM_VCPU_EL2* by bumping KVM_VCPU_MAX_FEATURES up.

We also now advertise the features to userspace with new capabilities.

It's going to be great...

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20250514103501.2225951-17-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19 08:01:19 +01:00
Jakub Kicinski 1119e5519d net: sched: uapi: add more sanely named duplicate defines
The TCA_FLOWER_KEY_CFM enum has a UNSPEC and MAX with _OPT
in the name, but the real attributes don't. Add a MAX that
more reasonably matches the attrs.

The PAD in TCA_TAPRIO is the only attr which doesn't have
_ATTR in it, perhaps signifying that it's not a real attr?
If so interesting idea in abstract but it makes codegen painful.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250513221752.843102-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15 11:44:29 -07:00
Stanislav Fomichev 8802087d20 net: devmem: TCP tx netlink api
Add bind-tx netlink call to attach dmabuf for TX; queue is not
required, only ifindex and dmabuf fd for attachment.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250508004830.4100853-4-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-13 11:12:48 +02:00
Greg Kroah-Hartman ab6dc9a6c7 Merge 6.15-rc6 into usb-next
We need the USB fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-13 08:26:58 +02:00
Andrei Vagin a516403787 fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions
Patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard
regions", v2.

Introduce the PAGE_IS_GUARD flag in the PAGEMAP_SCAN ioctl to expose
information about guard regions.  This allows userspace tools, such as
CRIU, to detect and handle guard regions.

Currently, CRIU utilizes PAGEMAP_SCAN as a more efficient alternative to
parsing /proc/pid/pagemap.  Without this change, guard regions are
incorrectly reported as swap-anon regions, leading CRIU to attempt dumping
them and subsequently failing.

The series includes updates to the documentation and selftests to reflect
the new functionality.


This patch (of 3):

Introduce the PAGE_IS_GUARD flag in the PAGEMAP_SCAN ioctl to expose
information about guard regions.  This allows userspace tools, such as
CRIU, to detect and handle guard regions.

Link: https://lkml.kernel.org/r/20250324065328.107678-1-avagin@google.com
Link: https://lkml.kernel.org/r/20250324065328.107678-2-avagin@google.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:16 -07:00
Dmitry V. Levin 26bb32768f ptrace: introduce PTRACE_SET_SYSCALL_INFO request
PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of system
calls the tracee is blocked in.

This API allows ptracers to obtain and modify system call details in a
straightforward and architecture-agnostic way, providing a consistent way
of manipulating the system call number and arguments across architectures.

As in case of PTRACE_GET_SYSCALL_INFO, PTRACE_SET_SYSCALL_INFO also does
not aim to address numerous architecture-specific system call ABI
peculiarities, like differences in the number of system call arguments for
such system calls as pread64 and preadv.

The current implementation supports changing only those bits of system
call information that are used by strace system call tampering, namely,
syscall number, syscall arguments, and syscall return value.

Support of changing additional details returned by
PTRACE_GET_SYSCALL_INFO, such as instruction pointer and stack pointer,
could be added later if needed, by using struct ptrace_syscall_info.flags
to specify the additional details that should be set.  Currently, "flags"
and "reserved" fields of struct ptrace_syscall_info must be initialized
with zeroes; "arch", "instruction_pointer", and "stack_pointer" fields are
currently ignored.

PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY,
PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations. 
Other operations could be added later if needed.

Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with
PTRACE_GET_SYSCALL_INFO, but it didn't happen.  The last straw that
convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure to
provide an API of changing the first system call argument on riscv
architecture.

ptrace(2) man page:

long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
...
PTRACE_SET_SYSCALL_INFO
       Modify information about the system call that caused the stop.
       The "data" argument is a pointer to struct ptrace_syscall_info
       that specifies the system call information to be set.
       The "addr" argument should be set to sizeof(struct ptrace_syscall_info)).

Link: https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055eeaac@gmail.com/
Link: https://lkml.kernel.org/r/20250303112044.GF24170@strace.io
Signed-off-by: Dmitry V. Levin <ldv@strace.io>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Eugene Syromiatnikov <esyr@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: anton ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davide Berardi <berardi.dav@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Eugene Syromyatnikov <evgsyr@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Renzo Davoi <renzo@cs.unibo.it>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:48:15 -07:00
Dave Airlie 1faeeb315f amd-drm-next-6.16-2025-05-09:
amdgpu:
 - IPS fixes
 - DSC cleanup
 - DC Scaling updates
 - DC FP fixes
 - Fused I2C-over-AUX updates
 - SubVP fixes
 - Freesync fix
 - DMUB AUX fixes
 - VCN fix
 - Hibernation fixes
 - HDP fixes
 - DCN 2.1 fixes
 - DPIA fixes
 - DMUB updates
 - Use drm_file_err in amdgpu
 - Enforce isolation updates
 - Use new dma_fence helpers
 - USERQ fixes
 - Documentation updates
 - Misc code cleanups
 - SR-IOV updates
 - RAS updates
 - PSP 12 cleanups
 
 amdkfd:
 - Update error messages for SDMA
 - Userptr updates
 
 drm:
 - Add drm_file_err function
 
 dma-buf:
 - Add a helper to sort and deduplicate dma_fence arrays
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCaB6KhwAKCRC93/aFa7yZ
 2FYIAQD43sIYoZCo6l7lZt0m54d6Mt1jangVhMK/jH31eOgKYQEAoRJKSMjT6Ktl
 FLBzG8FXekwQELNu6EgyG8Ywwim9AQ0=
 =dfkK
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.16-2025-05-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.16-2025-05-09:

amdgpu:
- IPS fixes
- DSC cleanup
- DC Scaling updates
- DC FP fixes
- Fused I2C-over-AUX updates
- SubVP fixes
- Freesync fix
- DMUB AUX fixes
- VCN fix
- Hibernation fixes
- HDP fixes
- DCN 2.1 fixes
- DPIA fixes
- DMUB updates
- Use drm_file_err in amdgpu
- Enforce isolation updates
- Use new dma_fence helpers
- USERQ fixes
- Documentation updates
- Misc code cleanups
- SR-IOV updates
- RAS updates
- PSP 12 cleanups

amdkfd:
- Update error messages for SDMA
- Userptr updates

drm:
- Add drm_file_err function

dma-buf:
- Add a helper to sort and deduplicate dma_fence arrays

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250509230951.3871914-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-12 07:14:34 +10:00
Gal Pressman 1b2900db01 ethtool: Block setting of symmetric RSS when non-symmetric rx-flow-hash is requested
Symmetric RSS hash requires that:
* No other fields besides IP src/dst and/or L4 src/dst are set
* If src is set, dst must also be set

This restriction was only enforced when RXNFC was configured after
symmetric hash was enabled. In the opposite order of operations (RXNFC
then symmetric enablement) the check was not performed.

Perform the sanity check on set_rxfh as well, by iterating over all flow
types hash fields and making sure they are all symmetric.

Introduce a function that returns whether a flow type is hashable (not
spec only) and needs to be iterated over. To make sure that no one
forgets to update the list of hashable flow types when adding new flow
types, a static assert is added to draw the developer's attention.

The conversion of uapi #defines to enum is not ideal, but as Jakub
mentioned [1], we have precedent for that.

[1] https://lore.kernel.org/netdev/20250324073509.6571ade3@kernel.org/

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250508103034.885536-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09 16:24:28 -07:00
Jiri Olsa 8231533340 bpf: Add support to retrieve ref_ctr_offset for uprobe perf link
Adding support to retrieve ref_ctr_offset for uprobe perf link,
which got somehow omitted from the initial uprobe link info changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/bpf/20250509153539.779599-2-jolsa@kernel.org
2025-05-09 13:01:07 -07:00
Keke Li 6d406187eb media: uapi: Add stats info and parameters buffer for C3 ISP
Add a header that describes the 3A statistics buffer and the
parameters buffer for C3 ISP

Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Keke Li <keke.li@amlogic.com>
Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-05-09 12:08:37 +02:00
Keke Li a3aa115af2 media: Add C3ISP_PARAMS and C3ISP_STATS meta formats
C3ISP_PARAMS is the C3 ISP Parameters format.
C3ISP_STATS is the C3 ISP Statistics format.

Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Keke Li <keke.li@amlogic.com>
Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-05-09 12:08:37 +02:00
Jakub Kicinski 6b02fd7799 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc6).

No conflicts.

Adjacent changes:

net/core/dev.c:
  08e9f2d584 ("net: Lock netdevices during dev_shutdown")
  a82dc19db1 ("net: avoid potential race between netdev_get_by_index_lock() and netns switch")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-08 08:59:02 -07:00
Srinivas Pandruvada d6644d737b
platform/x86: ISST: Support SST-PP revision 2
SST PP revision 2 added fabric 1 P0, P1 and Pm frequencies. Export them
by using a new IOCTL ISST_IF_GET_PERF_LEVEL_FABRIC_INFO. This IOCTL
requires platforms with SST PP revision 2 or higher.

To accommodate potential future increases in fabric count and avoid ABI
changes, support is extended for up to 8 fabrics.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20250506163531.1061185-3-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-08 16:04:06 +03:00
Paul Chaignon f5c79ffdc2 bpf: Clarify handling of mark and tstamp by redirect_peer
When switching network namespaces with the bpf_redirect_peer helper, the
skb->mark and skb->tstamp fields are not zeroed out like they can be on
a typical netns switch. This patch clarifies that in the helper
description.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/ccc86af26d43c5c0b776bcba2601b7479c0d46d0.1746460653.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-07 18:16:33 -07:00
John Garry 5d894321c4 fs: add atomic write unit max opt to statx
XFS will be able to support large atomic writes (atomic write > 1x block)
in future. This will be achieved by using different operating methods,
depending on the size of the write.

Specifically a new method of operation based in FS atomic extent remapping
will be supported in addition to the current HW offload-based method.

The FS method will generally be appreciably slower performing than the
HW-offload method. However the FS method will be typically able to
contribute to achieving a larger atomic write unit max limit.

XFS will support a hybrid mode, where HW offload method will be used when
possible, i.e. HW offload is used when the length of the write is
supported, and for other times FS-based atomic writes will be used.

As such, there is an atomic write length at which the user may experience
appreciably slower performance.

Advertise this limit in a new statx field, stx_atomic_write_unit_max_opt.

When zero, it means that there is no such performance boundary.

Masks STATX{_ATTR}_WRITE_ATOMIC can be used to get this new field. This is
ok for older kernels which don't support this new field, as they would
report 0 in this field (from zeroing in cp_statx()) already. Furthermore
those older kernels don't support large atomic writes - apart from block
fops, but there would be consistent performance there for atomic writes
in range [unit min, unit max].

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
2025-05-07 14:25:30 -07:00
Jakub Kicinski 9daaf19786 wireless features, notably
* stack
    - free SKBTX_WIFI_STATUS flag
    - fixes for VLAN multicast in multi-link
    - improve codel parameters (revert some old twiddling)
  * ath12k
    - Enable AHB support for IPQ5332.
    - Add monitor interface support to QCN9274.
    - Add MLO support to WCN7850.
    - Add 802.11d scan offload support to WCN7850.
  * ath11k
    - Restore hibernation support
  * iwlwifi
    - EMLSR on two 5 GHz links
  * mwifiex
    - cleanups/refactoring
 
 along with many other small features/cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmgaSmcACgkQ10qiO8sP
 aAA4hA//f/nfeLTAnhON53mDlqxa55/2bw9XSH7pOIOasVBWxmuYxhWfn5uiZluI
 zOGlBO7vtJYvPrVHEHSuPWMNCQ+ieL2ShRSP5BBfy3KBYdD4gKKAd95WoiXmRVp4
 d13OYtF9msFbVXOZYMyxMHmzIrWlBQIokjOSjqjSBeYRnD8U0GRemiSecugWo/qI
 bE7xcZuTgKBy+gr7242017DcUjWBdWcsp1C6C+COZm/KrSihQ0SQ4PIcOZgPsZjl
 COGextZltbWW56qnlp6QC394V+Vhah+Owcz3Qqz9zZ7hzJnuPo+DnpPMShhRGruL
 /IgqKhzcuye5UUJJl8nD768x6ebClchkcBC+A/hfjk5UYVl/oZxA1Bw5fC2O+5VU
 ycDMHr1Qu/yEE2rbwIJPGNOZ7NisqJFF07CPFuygKjBGNnp7I5S7H6UJsKRi5MuZ
 0CXHiFMhuHgWLmDFauIa66XI1JpqIzQgbZSjqVYFKRqwEz3yRKwihwDzZy1m6pDQ
 NhMFedQznFkMSJ+m020IOxxXYPy98PHcus+TZWL4os0SJTfEycEUWJZnlJ47Bb25
 Z9IF1OER2I4giMKxUVKRoDq0SGStbZODwqxrVCRD261Um9ybbfDdRbDeDOmvY+Gu
 jEmE6tscWbRfDvCS2M+xIg+MAcVHJq5gO4S8RopMSJlFHY/T/uE=
 =s9mS
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
wireless features, notably

 * stack
   - free SKBTX_WIFI_STATUS flag
   - fixes for VLAN multicast in multi-link
   - improve codel parameters (revert some old twiddling)
 * ath12k
   - Enable AHB support for IPQ5332.
   - Add monitor interface support to QCN9274.
   - Add MLO support to WCN7850.
   - Add 802.11d scan offload support to WCN7850.
 * ath11k
   - Restore hibernation support
 * iwlwifi
   - EMLSR on two 5 GHz links
 * mwifiex
   - cleanups/refactoring

along with many other small features/cleanups

* tag 'wireless-next-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (177 commits)
  Revert "wifi: iwlwifi: clean up config macro"
  wifi: iwlwifi: move phy_filters to fw_runtime
  wifi: iwlwifi: pcie: make sure to lock rxq->read
  wifi: iwlwifi: add definitions for iwl_mac_power_cmd version 2
  wifi: iwlwifi: clean up config macro
  wifi: iwlwifi: mld: simplify iwl_mld_rx_fill_status()
  wifi: iwlwifi: mld: rx: simplify channel handling
  wifi: iwlwifi: clean up band in RX metadata
  wifi: iwlwifi: mld: skip unknown FW channel load values
  wifi: iwlwifi: define API for external FSEQ images
  wifi: iwlwifi: mld: allow EMLSR on separated 5 GHz subbands
  wifi: iwlwifi: mld: use cfg80211_chandef_get_width()
  wifi: iwlwifi: mld: fix iwl_mld_emlsr_disallowed_with_link() return
  wifi: iwlwifi: mld: clarify variable type
  wifi: iwlwifi: pcie: add support for the reset handshake in MSI
  wifi: mac80211_hwsim: Prevent tsf from setting if beacon is disabled
  wifi: mac80211: restructure tx profile retrieval for MLO MBSSID
  wifi: nl80211: add link id of transmitted profile for MLO MBSSID
  wifi: ieee80211: Add helpers to fetch EMLSR delay and timeout values
  wifi: mac80211: update ML STA with EML capabilities
  ...

====================

Link: https://patch.msgid.link/20250506174656.119970-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-06 19:04:42 -07:00
Jiri Pirko 429ac62114 devlink: define enum for attr types of dynamic attributes
Devlink param and health reporter fmsg use attributes with dynamic type
which is determined according to a different type. Currently used values
are NLA_*. The problem is, they are not part of UAPI. They may change
which would cause a break.

To make this future safe, introduce a enum that shadows NLA_* values in
it and is part of UAPI.

Also, this allows to possibly carry types that are unrelated to NLA_*
values.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250505114513.53370-3-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-06 18:21:11 -07:00
Pavel Begunkov a5c98e9424 io_uring/zcrx: dmabuf backed zerocopy receive
Add support for dmabuf backed zcrx areas. To use it, the user should
pass IORING_ZCRX_AREA_DMABUF in the struct io_uring_zcrx_area_reg flags
field and pass a dmabuf fd in the dmabuf_fd field.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20bb1890e60a82ec945ab36370d1fd54be414ab6.1746097431.git.asml.silence@gmail.com
Link: https://lore.kernel.org/io-uring/6e37db97303212bbd8955f9501cf99b579f8aece.1746547722.git.asml.silence@gmail.com
[axboe: fold in fixup]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06 10:11:00 -06:00
Keith Busch 02040353f4 io_uring: enable per-io write streams
Allow userspace to pass a per-I/O write stream in the SQE:

      __u8 write_stream;

The __u8 type matches the size the filesystems and block layer support.

Application can query the supported values from the block devices
max_write_streams sysfs attribute. Unsupported values are ignored by
file operations that do not support write streams or rejected with an
error by those that support them.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20250506121732.8211-7-joshi.k@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-06 07:46:43 -06:00
Dave Airlie 5e0c679981 Linux 6.15-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgX1CgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGxiIH/A7LHlVatGEQgRFi
 0JALDgcuGTMtMU1qD43rv8Z1GXqTpCAlaBt9D1C9cUH/86MGyBTVRWgVy0wkaU2U
 8QSfFWQIbrdaIzelHtzmAv5IDtb+KrcX1iYGLcMb6ZYaWkv8/CMzMX1nkgxEr1QT
 37Xo3/F17yJumAdNQxdRhVLGy2d3X5rScecpufwh97sMwoddllMCDs2LIoeSAYpG
 376/wzni09G2fADa8MEKqcaMue4qcf0FOo/gOkT8YwFGSZLKa6uumlBLg04QoCt0
 foK2vfcci1q4H4ZbCu3uQESYGLQHY0f2ICDCwC3m25VF9a81TmlbC3MLum3vhmKe
 RtLDcXg=
 =xyaI
 -----END PGP SIGNATURE-----

BackMerge tag 'v6.15-rc5' into drm-next

Linux 6.15-rc5, requested by tzimmerman for fixes required in drm-next.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-06 16:39:25 +10:00
Christoph Hellwig eeadd68e2a block: remove bounce buffering support
The block layer bounce buffering support is unused now, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20250505081138.3435992-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05 13:22:39 -06:00
Kevin Wolf 7734fb4ad9 dm mpath: Interface for explicit probing of active paths
Multipath cannot directly provide failover for ioctls in the kernel
because it doesn't know what each ioctl means and which result could
indicate a path error. Userspace generally knows what the ioctl it
issued means and if it might be a path error, but neither does it know
which path the ioctl took nor does it necessarily have the privileges to
fail a path using the control device.

In order to allow userspace to address this situation, implement a
DM_MPATH_PROBE_PATHS ioctl that prompts the dm-mpath driver to probe all
active paths in the current path group to see whether they still work,
and fail them if not. If this returns success, userspace can retry the
ioctl and expect that the previously hit bad path is now failed (or
working again).

The immediate motivation for this is the use of SG_IO in QEMU for SCSI
passthrough. Following a failed SG_IO ioctl, QEMU will trigger probing
to ensure that all active paths are actually alive, so that retrying
SG_IO at least has a lower chance of failing due to a path error.
However, the problem is broader than just SG_IO (it affects any ioctl),
and if applications need failover support for other ioctls, the same
probing can be used.

This is not implemented on the DM control device, but on the DM mpath
block devices, to allow all users who have access to such a block device
to make use of this interface, specifically to implement failover for
ioctls. For the same reason, it is also unprivileged. Its implementation
is effectively just a bunch of reads, which could already be issued by
userspace, just without any guarantee that all the rights paths are
selected.

The probing implemented here is done fully synchronously path by path;
probing all paths concurrently is left as an improvement for the future.

Co-developed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-05-04 11:35:06 +02:00
Peter Zijlstra c042c50521 futex: Implement FUTEX2_MPOL
Extend the futex2 interface to be aware of mempolicy.

When FUTEX2_MPOL is specified and there is a MPOL_PREFERRED or
home_node specified covering the futex address, use that hash-map.

Notably, in this case the futex will go to the global node hashtable,
even if it is a PRIVATE futex.

When FUTEX2_NUMA|FUTEX2_MPOL is specified and the user specified node
value is FUTEX_NO_NODE, the MPOL lookup (as described above) will be
tried first before reverting to setting node to the local node.

[bigeasy: add CONFIG_FUTEX_MPOL, add MPOL to FUTEX2_VALID_MASK, write
the node only to user if FUTEX_NO_NODE was supplied]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-18-bigeasy@linutronix.de
2025-05-03 12:02:09 +02:00
Peter Zijlstra cec199c5e3 futex: Implement FUTEX2_NUMA
Extend the futex2 interface to be numa aware.

When FUTEX2_NUMA is specified for a futex, the user value is extended
to two words (of the same size). The first is the user value we all
know, the second one will be the node to place this futex on.

  struct futex_numa_32 {
	u32 val;
	u32 node;
  };

When node is set to ~0, WAIT will set it to the current node_id such
that WAKE knows where to find it. If userspace corrupts the node value
between WAIT and WAKE, the futex will not be found and no wakeup will
happen.

When FUTEX2_NUMA is not set, the node is simply an extension of the
hash, such that traditional futexes are still interleaved over the
nodes.

This is done to avoid having to have a separate !numa hash-table.

[bigeasy: ensure to have at least hashsize of 4 in futex_init(), add
pr_info() for size and allocation information. Cast the naddr math to
void*]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-17-bigeasy@linutronix.de
2025-05-03 12:02:09 +02:00
Sebastian Andrzej Siewior 63e8595c06 futex: Allow to make the private hash immutable
My initial testing showed that:

	perf bench futex hash

reported less operations/sec with private hash. After using the same
amount of buckets in the private hash as used by the global hash then
the operations/sec were about the same.

This changed once the private hash became resizable. This feature added
an RCU section and reference counting via atomic inc+dec operation into
the hot path.
The reference counting can be avoided if the private hash is made
immutable.
Extend PR_FUTEX_HASH_SET_SLOTS by a fourth argument which denotes if the
private should be made immutable. Once set (to true) the a further
resize is not allowed (same if set to global hash).
Add PR_FUTEX_HASH_GET_IMMUTABLE which returns true if the hash can not
be changed.
Update "perf bench" suite.

For comparison, results of "perf bench futex hash -s":
- Xeon CPU E5-2650, 2 NUMA nodes, total 32 CPUs:
  - Before the introducing task local hash
    shared  Averaged 1.487.148 operations/sec (+- 0,53%), total secs = 10
    private Averaged 2.192.405 operations/sec (+- 0,07%), total secs = 10

  - With the series
    shared  Averaged 1.326.342 operations/sec (+- 0,41%), total secs = 10
    -b128   Averaged   141.394 operations/sec (+- 1,15%), total secs = 10
    -Ib128  Averaged   851.490 operations/sec (+- 0,67%), total secs = 10
    -b8192  Averaged   131.321 operations/sec (+- 2,13%), total secs = 10
    -Ib8192 Averaged 1.923.077 operations/sec (+- 0,61%), total secs = 10
    128 is the default allocation of hash buckets.
    8192 was the previous amount of allocated hash buckets.

- Xeon(R) CPU E7-8890 v3, 4 NUMA nodes, total 144 CPUs:
  - Before the introducing task local hash
    shared   Averaged 1.810.936 operations/sec (+- 0,26%), total secs = 20
    private  Averaged 2.505.801 operations/sec (+- 0,05%), total secs = 20

  - With the series
    shared   Averaged 1.589.002 operations/sec (+- 0,25%), total secs = 20
    -b1024   Averaged    42.410 operations/sec (+- 0,20%), total secs = 20
    -Ib1024  Averaged   740.638 operations/sec (+- 1,51%), total secs = 20
    -b65536  Averaged    48.811 operations/sec (+- 1,35%), total secs = 20
    -Ib65536 Averaged 1.963.165 operations/sec (+- 0,18%), total secs = 20
    1024 is the default allocation of hash buckets.
    65536 was the previous amount of allocated hash buckets.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://lore.kernel.org/r/20250416162921.513656-16-bigeasy@linutronix.de
2025-05-03 12:02:08 +02:00
Sebastian Andrzej Siewior 80367ad01d futex: Add basic infrastructure for local task local hash
The futex hash is system wide and shared by all tasks. Each slot
is hashed based on futex address and the VMA of the thread. Due to
randomized VMAs (and memory allocations) the same logical lock (pointer)
can end up in a different hash bucket on each invocation of the
application. This in turn means that different applications may share a
hash bucket on the first invocation but not on the second and it is not
always clear which applications will be involved. This can result in
high latency's to acquire the futex_hash_bucket::lock especially if the
lock owner is limited to a CPU and can not be effectively PI boosted.

Introduce basic infrastructure for process local hash which is shared by
all threads of process. This hash will only be used for a
PROCESS_PRIVATE FUTEX operation.

The hashmap can be allocated via:

        prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, num);

A `num' of 0 means that the global hash is used instead of a private
hash.
Other values for `num' specify the number of slots for the hash and the
number must be power of two, starting with two.
The prctl() returns zero on success. This function can only be used
before a thread is created.

The current status for the private hash can be queried via:

        num = prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS);

which return the current number of slots. The value 0 means that the
global hash is used. Values greater than 0 indicate the number of slots
that are used. A negative number indicates an error.

For optimisation, for the private hash jhash2() uses only two arguments
the address and the offset. This omits the VMA which is always the same.

[peterz: Use 0 for global hash. A bit shuffling and renaming. ]

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-13-bigeasy@linutronix.de
2025-05-03 12:02:07 +02:00
Jakub Kicinski 337079d31f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc5).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01 15:11:38 -07:00
Linus Torvalds ebd297a2af Happy May Day.
Things have calmed down on our end (knock on wood), no outstanding
 investigations. Including fixes from Bluetooth and WiFi.
 
 Current release - fix to a fix:
 
  - igc: fix lock order in igc_ptp_reset
 
 Current release - new code bugs:
 
  - Revert "wifi: iwlwifi: make no_160 more generic", fixes regression
    to Killer line of devices reported by a number of people
 
  - Revert "wifi: iwlwifi: add support for BE213", initial FW is too buggy
 
  - number of fixes for mld, the new Intel WiFi subdriver
 
 Previous releases - regressions:
 
  - wifi: mac80211: restore monitor for outgoing frames
 
  - drv: vmxnet3: fix malformed packet sizing in vmxnet3_process_xdp
 
  - eth: bnxt_en: fix timestamping FIFO getting out of sync on reset,
    delivering stale timestamps
 
  - use sock_gen_put() in the TCP fraglist GRO heuristic, don't assume
    every socket is a full socket
 
 Previous releases - always broken:
 
  - sched: adapt qdiscs for reentrant enqueue cases, fix list corruptions
 
  - xsk: fix race condition in AF_XDP generic RX path, shared UMEM
    can't be protected by a per-socket lock
 
  - eth: mtk-star-emac: fix spinlock recursion issues on rx/tx poll
 
  - btusb: avoid NULL pointer dereference in skb_dequeue()
 
  - dsa: felix: fix broken taprio gate states after clock jump
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmgTmIoACgkQMUZtbf5S
 IruoZhAApd5csY/U9Pl7h5CK6mQek7qk+EbDcLs+/j/0UFXcOayuFKILHT27BIe5
 gEVbQBkcEpzj+QKJblWvuczreBXChTmhfQb1zItRMWJktYmlqYtPsPE4kM2inzGo
 94iDeSTca/LKPfc4dPQ/rBfXMQb7NX7UhWRR9AFBWhOuM/SDjItyZUFPqhmGDctO
 8+7YdftK2qFcwIN18kU4GqbmJEZK9Y4+ePabABR7VBFtw70bpdCptWp5tmRMbq6V
 KphxkWbWg8SP1HUnAfRAS4C3TO4Dn6e7CVaAirroqrHWmawiWgHV/o1awc0ErO8k
 aAgVhPOB5zzVqbV0ZeoEj5PL5/2rus/KmWkMludRAqGGivcQOu6GOuvo+QPwpYCU
 3edxCU95kqRB6XrjsLnZ6HnpwtBr6CUVf9FXMGNt4199cWN/PE+w4qxTz2vxiiH3
 At0su5W41s4mvldktait+ilwk+R3v1rIzTR6IDmNPW8pACI8RlQt8/Mgy7smDZp9
 FEKtWhXHdww3DosB3vHOGZ5u6gRlGvfjPsJlt2C9fGZIdrOJzzHDZ37BXDYDTYnx
 9mejzy/YAYjK826YGA3sqJ3fUDFUoAynZuajDKxErvtIZlRzEDLu+7pG4zAB8Nrx
 tHdz5iR2rDdZ87S4jB8CAP7FGVs+YZkekVvY5fIw4dGs1QMc2wE=
 =6xNz
 -----END PGP SIGNATURE-----

Merge tag 'net-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Happy May Day.

  Things have calmed down on our end (knock on wood), no outstanding
  investigations. Including fixes from Bluetooth and WiFi.

  Current release - fix to a fix:

   - igc: fix lock order in igc_ptp_reset

  Current release - new code bugs:

   - Revert "wifi: iwlwifi: make no_160 more generic", fixes regression
     to Killer line of devices reported by a number of people

   - Revert "wifi: iwlwifi: add support for BE213", initial FW is too
     buggy

   - number of fixes for mld, the new Intel WiFi subdriver

  Previous releases - regressions:

   - wifi: mac80211: restore monitor for outgoing frames

   - drv: vmxnet3: fix malformed packet sizing in vmxnet3_process_xdp

   - eth: bnxt_en: fix timestamping FIFO getting out of sync on reset,
     delivering stale timestamps

   - use sock_gen_put() in the TCP fraglist GRO heuristic, don't assume
     every socket is a full socket

  Previous releases - always broken:

   - sched: adapt qdiscs for reentrant enqueue cases, fix list
     corruptions

   - xsk: fix race condition in AF_XDP generic RX path, shared UMEM
     can't be protected by a per-socket lock

   - eth: mtk-star-emac: fix spinlock recursion issues on rx/tx poll

   - btusb: avoid NULL pointer dereference in skb_dequeue()

   - dsa: felix: fix broken taprio gate states after clock jump"

* tag 'net-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  net: vertexcom: mse102x: Fix RX error handling
  net: vertexcom: mse102x: Add range check for CMD_RTS
  net: vertexcom: mse102x: Fix LEN_MASK
  net: vertexcom: mse102x: Fix possible stuck of SPI interrupt
  net: hns3: defer calling ptp_clock_register()
  net: hns3: fixed debugfs tm_qset size
  net: hns3: fix an interrupt residual problem
  net: hns3: store rx VLAN tag offload state for VF
  octeon_ep: Fix host hang issue during device reboot
  net: fec: ERR007885 Workaround for conventional TX
  net: lan743x: Fix memleak issue when GSO enabled
  ptp: ocp: Fix NULL dereference in Adva board SMA sysfs operations
  net: use sock_gen_put() when sk_state is TCP_TIME_WAIT
  bnxt_en: fix module unload sequence
  bnxt_en: Fix ethtool -d byte order for 32-bit values
  bnxt_en: Fix out-of-bound memcpy() during ethtool -w
  bnxt_en: Fix coredump logic to free allocated buffer
  bnxt_en: delay pci_alloc_irq_vectors() in the AER path
  bnxt_en: call pci_alloc_irq_vectors() after bnxt_reserve_rings()
  bnxt_en: Add missing skb_mark_for_recycle() in bnxt_rx_vlan()
  ...
2025-05-01 10:37:49 -07:00
Hans Verkuil d12ddda523 media: uapi: cec-funcs.h: use CEC_LOG_ADDR_BROADCAST
The cec-funcs.h header sets the destination to 0xf for those
messages that can only be broadcast. Instead of writing:

	msg->msg[0] |= 0xf; /* broadcast */

just write:

	msg->msg[0] |= CEC_LOG_ADDR_BROADCAST;

which is more descriptive and allows us to drop the comment.

Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2025-04-30 08:16:07 +02:00
Jakub Kicinski 1f773970a7 netfilter pull request 25-04-29
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmgP++oACgkQ1w0aZmrP
 KyHxVw/5ASwm5WPt7qRgSLKlbIzUbTQ5a2CEC7WS6ova5M0nvxFL6f/JACeK3tTt
 nC+2+Vx89PFYB1fcGE9MfrZBFf/Ccm9/leluKtYJ0l7VnhLAxAByn5EbExhTFPeP
 YA3neewU62x8Cm3JQ6pZrFQYjz7pEGfSgjJ6E2dncIDMcjQzFAnDxEAB8VDqzFde
 yAosgA3yJtvan2otuqAqg33i0Q32FM0rhFplIB0pvJpbq24YVs1vTupSp9xEAeDu
 Ms5dcm2fN1L2HEgt2xhDrhVLEIp3CDZpC0k9hf7eJkonm03nCBlfqUd6gPFwdz0j
 YoHziIDO4zdtefw4Z1Bd31L5BgRyEZkbf6E8qWqCb5hpauKMKNwrtaa2qAMYqWmF
 UQuGXCXkIJVTfwxrCk1zyYKRhc0ECXpah65lB7RwxlAayDWTvEGKdcWKs9Hk5jx3
 hHgxfqO+nRfUjzGap5bWpOt18mfwZn287XSWFK6Ko6m/6UgzORfEEby/YoDUELKL
 xZBWy50+fauSkhFpQXt+IRT5rirmfMYv9hBBweKBwK05txMR4JdTNcI+zZxLMwnK
 sDbRjUsE1Req2ZnJPF33PcbkbvHj/B8m3awPxt2C3/kDK5WpKL3986Bj0hIJ25iM
 12BD5iSfOTGIZJCdRMW7SAoI1h3Z7+cePF4UjxiWS7jCbKE6u2E=
 =tp12
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-25-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next:

1) Replace msecs_to_jiffies() by secs_to_jiffies(), from Easwar Hariharan.

2) Allow to compile xt_cgroup with cgroupsv2 support only,
   from Michal Koutny.

3) Prepare for sock_cgroup_classid() removal by wrapping it around
   ifdef, also from Michal Koutny.

4) Remove redundant pointer fetch on conntrack template, from Xuanqiang Luo.

5) Re-format one block in the tproxy documentation for consistency,
   from Chen Linxuan.

6) Expose set element count and type via netlink attributes,
   from Florian Westphal.

* tag 'nf-next-25-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: export set count and backend name to userspace
  docs: tproxy: fix formatting for nft code block
  netfilter: conntrack: Remove redundant NFCT_ALIGN call
  net: cgroup: Guard users of sock_cgroup_classid()
  netfilter: xt_cgroup: Make it independent from net_cls
  netfilter: xt_IDLETIMER: convert timeouts to secs_to_jiffies()
====================

Link: https://patch.msgid.link/20250428221254.3853-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29 16:31:10 -07:00
Kory Maincent 10c34b7d71 netlink: specs: ethtool: Remove UAPI duplication of phy-upstream enum
The phy-upstream enum is already defined in the ethtool.h UAPI header
and used by the ethtool userspace tool. However, the ethtool spec does
not reference it, causing YNL to auto-generate a duplicate and redundant
enum.

Fix this by updating the spec to reference the existing UAPI enum
in ethtool.h.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250425171419.947352-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28 15:49:47 -07:00
Florian Westphal 0014af8021 netfilter: nf_tables: export set count and backend name to userspace
nf_tables picks a suitable set backend implementation (bitmap, hash,
rbtree..) based on the userspace requirements.

Figuring out the chosen backend requires information about the set flags
and the kernel version.  Export this to userspace so nft can include this
information in '--debug=netlink' output.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-29 00:00:27 +02:00
Alexei Starovoitov 224ee86639 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc4
Cross-merge bpf and other fixes after downstream PRs.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-28 08:40:45 -07:00
Greg Kroah-Hartman 615dca38c2 Linux 6.15-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgOrWseHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFyIH/AhXcuA8y8rk43mo
 t+0GO7JR4dnr4DIl74GgDjCXlXiKCT7EXMfD/ABdofTxV4Pbyv+pUODlg1E6eO9U
 C1WWM5PPNBGDDEVSQ3Yu756nr0UoiFhvW0R6pVdou5cezCWAtIF9LTN8DEUgis0u
 EUJD9+/cHAMzfkZwabjm/HNsa1SXv2X47MzYv/PdHKr0htEPcNHF4gqBrBRdACGy
 FJtaCKhuPf6TcDNXOFi5IEWMXrugReRQmOvrXqVYGa7rfUFkZgsAzRY6n/rUN5Z9
 FAgle4Vlv9ohVYj9bXX8b6wWgqiKRpoN+t0PpRd6G6ict1AFBobNGo8LH3tYIKqZ
 b/dCGNg=
 =xDGd
 -----END PGP SIGNATURE-----

Merge 6.15-rc4 into usb-next

We need the USB fixes in here as well, and this resolves the following
merge conflicts that were reported in linux-next:

	drivers/usb/chipidea/ci_hdrc_imx.c
	drivers/usb/host/xhci.h

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-28 10:32:58 +02:00
Christian Brauner a71f402acd
pidfs: get rid of __pidfd_prepare()
Fold it into pidfd_prepare() and rename PIDFD_CLONE to PIDFD_STALE to
indicate that the passed pid might not have task linkage and no explicit
check for that should be performed.

Link: https://lore.kernel.org/20250425-work-pidfs-net-v2-3-450a19461e75@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: David Rheinsberg <david@readahead.eu>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-26 08:28:03 +02:00
Jeremy Harris 2b13042d36 tcp: fastopen: pass TFO child indication through getsockopt
tcp: fastopen: pass TFO child indication through getsockopt

Note that this uses up the last bit of a field in struct tcp_info

Signed-off-by: Jeremy Harris <jgh@exim.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250423124334.4916-3-jgh@exim.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-24 18:21:04 -07:00
Linus Torvalds 30e268185e Landlock fix for v6.15-rc4
-----BEGIN PGP SIGNATURE-----
 
 iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCaAp+shAcbWljQGRpZ2lr
 b2QubmV0AAoJEOXj0OiMgvbSAukA/1FkHkyqXaiMYy2s4tlwvpjWMBP11gEUzyYM
 j1HIn3rOAQDUT1V2q+Ff5ZVk4hsCZ8pJMTmrASSL/CHw1yfQHtzQAw==
 =5RX1
 -----END PGP SIGNATURE-----

Merge tag 'landlock-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock fixes from Mickaël Salaün:
 "Fix some Landlock audit issues, add related tests, and updates
  documentation"

* tag 'landlock-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  landlock: Update log documentation
  landlock: Fix documentation for landlock_restrict_self(2)
  landlock: Fix documentation for landlock_create_ruleset(2)
  selftests/landlock: Add PID tests for audit records
  selftests/landlock: Factor out audit fixture in audit_test
  landlock: Log the TGID of the domain creator
  landlock: Remove incorrect warning
2025-04-24 12:59:05 -07:00
Jakub Kicinski 5565acd1e6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc4).

This pull includes wireless and a fix to vxlan which isn't
in Linus's tree just yet. The latter creates with a silent conflict
/ build breakage, so merging it now to avoid causing problems.

drivers/net/vxlan/vxlan_vnifilter.c
  094adad913 ("vxlan: Use a single lock to protect the FDB table")
  087a9eb9e5 ("vxlan: vnifilter: Fix unlocked deletion of default FDB entry")
https://lore.kernel.org/20250423145131.513029-1-idosch@nvidia.com

No "normal" conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-24 11:20:52 -07:00
Rameshkumar Sundaram 37523c3c47 wifi: nl80211: add link id of transmitted profile for MLO MBSSID
During non-transmitted (nontx) profile configuration, interface
index of the transmitted (tx) profile is used to retrieve the
wireless device (wdev) associated with it. With MLO, this 'wdev'
may be part of an MLD with more than one link, hence only
interface index is not sufficient anymore to retrieve the correct
tx profile. Add a new attribute to configure link id of tx profile.

Signed-off-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Co-developed-by: Muna Sinada <muna.sinada@oss.qualcomm.com>
Signed-off-by: Muna Sinada <muna.sinada@oss.qualcomm.com>
Co-developed-by: Aloka Dixit <aloka.dixit@oss.qualcomm.com>
Signed-off-by: Aloka Dixit <aloka.dixit@oss.qualcomm.com>
Link: https://patch.msgid.link/20250408184501.3715887-2-aloka.dixit@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-04-23 18:03:30 +02:00
Linus Torvalds 0251ddbffb virtio, vhost: fixes
A small number of fixes.
 
 virtgpu is exempt from reset shutdown fow now -
 	 a more complete fix is in the works
 spec compliance fixes in:
 	virtio-pci cap commands
 	vhost_scsi_send_bad_target
 	virtio console resize
 missing locking fix in vhost-scsi
 virtio ring - a KCSAN false positive fix
 VHOST_*_OWNER documentation fix
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmgIi3cPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpGU8H/1Fq8pH+irGvyH4E21O03qx0wiM+lcYVhNH5
 2a3rjOwuJBiLvscZTJG/w07hIpx0O4WrbygdT0BTll4Uen2C+OpGn/Y1LfhW6wsr
 3yyeBpTr5hKiY8sOD08rMTHTCM4mD8UdYr13RcNq+eUxNZ6bA+kiGaXpIk0AiRPR
 5pdbx16cTZM7k+/9aXp68hRO7yHnyilGzAJG1hHmfx1L5Mt++RVKsf2KI+3YHWcI
 0ZZj/NP3iZfNm57+QpKX6zYikH4IFIer1r9wotMaR74brpuq8w7HKZUqe3VfG11Y
 TBgq6NfDZVq8G8bCGPv+C+DfDnpYMFVYqytCLn4/AyOhLNCRDs8=
 =m8wk
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "A small number of fixes:

   - virtgpu is exempt from reset shutdown fow now - a more complete fix
     is in the works

   - spec compliance fixes in:
       - virtio-pci cap commands
       - vhost_scsi_send_bad_target
       - virtio console resize

   - missing locking fix in vhost-scsi

   - virtio ring - a KCSAN false positive fix

   - VHOST_*_OWNER documentation fix"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost-scsi: Fix vhost_scsi_send_status()
  vhost-scsi: Fix vhost_scsi_send_bad_target()
  vhost-scsi: protect vq->log_used with vq->mutex
  vhost_task: fix vhost_task_create() documentation
  virtio_console: fix order of fields cols and rows
  virtio_console: fix missing byte order handling for cols and rows
  virtgpu: don't reset on shutdown
  virtio_ring: Fix data race by tagging event_triggered as racy for KCSAN
  vhost: fix VHOST_*_OWNER documentation
  virtio_pci: Use self group type for cap commands
2025-04-23 08:25:56 -07:00
Omri Mann 98b995660b ublk: Add UBLK_U_CMD_UPDATE_SIZE
Currently ublk only allows the size of the ublkb block device to be
set via UBLK_CMD_SET_PARAMS before UBLK_CMD_START_DEV is triggered.

This does not provide support for extendable user-space block devices
without having to stop and restart the underlying ublkb block device
causing IO interruption.

This patch adds a new ublk command UBLK_U_CMD_UPDATE_SIZE to allow the
ublk block device to be resized on-the-fly.

Feature flag UBLK_F_UPDATE_SIZE is also added to indicate support.

Signed-off-by: Omri Mann <omri@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/2a370ab1-d85b-409d-b762-f9f3f6bdf705@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 09:25:41 -06:00
Alexei Starovoitov 5709be4c35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3
Cross-merge bpf and other fixes after downstream PRs.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-21 08:04:38 -07:00
Jens Axboe 53db8a71ec io_uring: add support for IORING_OP_PIPE
This works just like pipe2(2), except it also supports fixed file
descriptors. Used in a similar fashion as for other fd instantiating
opcodes (like accept, socket, open, etc), where sqe->file_slot is set
appropriately if two direct descriptors are desired rather than a set
of normal file descriptors.

sqe->addr must be set to a pointer to an array of 2 integers, which
is where the fixed/normal file descriptors are copied to.

sqe->pipe_flags contains flags, same as what is allowed for pipe2(2).

Future expansion of per-op private flags can go in sqe->ioprio,
like we do for other opcodes that take both a "syscall" flag set and
an io_uring opcode specific flag set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:06:58 -06:00
Krishna Chaitanya Chundru 178af54a67 PCI: Add lane equalization register offsets
As per PCIe spec 6.0.1, add PCIe lane equalization register offset for
data rates 8.0 GT/s, 32.0 GT/s and 64.0 GT/s.

Also add a macro for defining data rate 64.0 GT/s physical layer capability
ID.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://patch.msgid.link/20250328-preset_v6-v9-4-22cfa0490518@oss.qualcomm.com
2025-04-19 19:42:43 +05:30
Jakub Kicinski 8066e388be net: add UAPI to the header guard in various network headers
fib_rule, ip6_tunnel, and a whole lot of if_* headers lack the customary
_UAPI in the header guard. Without it YNL build can't protect from in tree
and system headers both getting included. YNL doesn't need most of these
but it's annoying to have to fix them one by one.

Note that header installation strips this _UAPI prefix so this should
result in no change to the end user.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250416200840.1338195-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-17 19:04:50 -07:00
Antonio Quartulli f6226ae7a0 ovpn: introduce the ovpn_socket object
This specific structure is used in the ovpn kernel module
to wrap and carry around a standard kernel socket.

ovpn takes ownership of passed sockets and therefore an ovpn
specific objects is attached to them for status tracking
purposes.

Initially only UDP support is introduced. TCP will come in a later
patch.

Cc: willemdebruijn.kernel@gmail.com
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Link: https://patch.msgid.link/20250415-b4-ovpn-v26-6-577f6097b964@openvpn.net
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-17 12:30:02 +02:00
Antonio Quartulli c2d950c467 ovpn: add basic interface creation/destruction/management routines
Add basic infrastructure for handling ovpn interfaces.

Tested-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Link: https://patch.msgid.link/20250415-b4-ovpn-v26-3-577f6097b964@openvpn.net
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-17 12:30:02 +02:00
Antonio Quartulli b7a63391aa ovpn: add basic netlink support
This commit introduces basic netlink support with family
registration/unregistration functionalities and stub pre/post-doit.

More importantly it introduces the YAML uAPI description along
with its auto-generated files:
- include/uapi/linux/ovpn.h
- drivers/net/ovpn/netlink-gen.c
- drivers/net/ovpn/netlink-gen.h

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Link: https://patch.msgid.link/20250415-b4-ovpn-v26-2-577f6097b964@openvpn.net
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-17 12:30:02 +02:00
Mickaël Salaün 47ce2af848
landlock: Update log documentation
Fix and improve documentation related to landlock_restrict_self(2)'s
flags.  Update the LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF
documentation according to the current semantic.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250416154716.1799902-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-04-17 11:09:10 +02:00
Mickaël Salaün 25b1fc1cdc
landlock: Fix documentation for landlock_restrict_self(2)
Fix, deduplicate, and improve rendering of landlock_restrict_self(2)'s
flags documentation.

The flags are now rendered like the syscall's parameters and
description.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250416154716.1799902-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-04-17 11:09:10 +02:00
Mickaël Salaün 50492f942c
landlock: Fix documentation for landlock_create_ruleset(2)
Move and fix the flags documentation, and improve formatting.

It makes more sense and it eases maintenance to document syscall flags
in landlock.h, where they are defined.  This is already the case for
landlock_restrict_self(2)'s flags.

The flags are now rendered like the syscall's parameters and
description.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250416154716.1799902-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-04-17 11:09:07 +02:00
Pavel Begunkov 25744f8495 io_uring/zcrx: return ifq id to the user
IORING_OP_RECV_ZC requests take a zcrx object id via sqe::zcrx_ifq_idx,
which binds it to the corresponding if / queue. However, we don't return
that id back to the user. It's fine as currently there can be only one
zcrx and the user assumes that its id should be 0, but as we'll need
multiple zcrx objects in the future let's explicitly pass it back on
registration.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8714667d370651962f7d1a169032e5f02682a73e.1744722517.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15 07:37:49 -06:00
Luis Henriques 2396356a94 fuse: add more control over cache invalidation behaviour
Currently userspace is able to notify the kernel to invalidate the cache
for an inode.  This means that, if all the inodes in a filesystem need to
be invalidated, then userspace needs to iterate through all of them and do
this kernel notification separately.

This patch adds the concept of 'epoch': each fuse connection will have the
current epoch initialized and every new dentry will have it's d_time set to
the current epoch value.  A new operation will then allow userspace to
increment the epoch value.  Every time a dentry is d_revalidate()'ed, it's
epoch is compared with the current connection epoch and invalidated if it's
value is different.

Signed-off-by: Luis Henriques <luis@igalia.com>
Tested-by: Laura Promberger <laura.promberger@cern.ch>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-04-15 12:56:40 +02:00
David Howells 01af642697 rxrpc: Add the security index for yfs-rxgk
Add the security index and abort codes for the YFS variant of rxgk.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20250411095303.2316168-6-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14 17:36:41 -07:00
David Howells 5800b1cf3f rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE
Allow the app to request that CHALLENGEs be passed to it through an
out-of-band queue that allows recvmsg() to pick it up so that the app can
add data to it with sendmsg().

This will allow the application (AFS or userspace) to interact with the
process if it wants to and put values into user-defined fields.  This will
be used by AFS when talking to a fileserver to supply that fileserver with
a crypto key by which callback RPCs can be encrypted (ie. notifications
from the fileserver to the client).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250411095303.2316168-5-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14 17:36:41 -07:00
Joseph Huang 9fbe1e3e61 net: bridge: Add offload_fail_notification bopt
Add BR_BOOLOPT_MDB_OFFLOAD_FAIL_NOTIFICATION bool option.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250411150323.1117797-3-Joseph.Huang@garmin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14 15:56:42 -07:00
Joseph Huang e846fb5e7c net: bridge: mcast: Add offload failed mdb flag
Add MDB_FLAGS_OFFLOAD_FAILED and MDB_PG_FLAGS_OFFLOAD_FAILED to indicate
that an attempt to offload the MDB entry to switchdev has failed.

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250411150323.1117797-2-Joseph.Huang@garmin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14 15:56:42 -07:00
Stefano Garzarella a940e0a685 vhost: fix VHOST_*_OWNER documentation
VHOST_OWNER_SET and VHOST_OWNER_RESET are used in the documentation
instead of VHOST_SET_OWNER and VHOST_RESET_OWNER respectively.

To avoid confusion, let's use the right names in the documentation.
No change to the API, only the documentation is involved.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20250303085237.19990-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-04-14 09:51:31 -04:00
Daniel Jurgens 16c22c56d4 virtio_pci: Use self group type for cap commands
Section 2.12.1.2 of v1.4 of the VirtIO spec states:

The device and driver capabilities commands are currently defined for
self group type.
1. VIRTIO_ADMIN_CMD_CAP_ID_LIST_QUERY
2. VIRTIO_ADMIN_CMD_DEVICE_CAP_GET
3. VIRTIO_ADMIN_CMD_DRIVER_CAP_SET

Fixes: bfcad51860 ("virtio: Manage device and driver capabilities via the admin commands")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Message-Id: <20250304161442.90700-1-danielj@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-04-14 09:51:31 -04:00
Eric Huang 4172b556fd drm/amdkfd: add smi events for process start and end
rocm-smi will be able to show the events for KFD process
start/end, it is the implementation of this feature.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 17:01:25 -04:00
Wesley Cheng 67890d5794 ALSA: Add USB audio device jack type
Add an USB jack type, in order to support notifying of a valid USB audio
device.  Since USB audio devices can have a slew of different
configurations that reach beyond the basic headset and headphone use cases,
classify these devices differently.

Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
Acked-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250409194804.3773260-8-quic_wcheng@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-11 13:02:30 +02:00
Jiayuan Chen c449d5f3a3 tcp: add LINUX_MIB_PAWS_TW_REJECTED counter
When TCP is in TIME_WAIT state, PAWS verification uses
LINUX_PAWSESTABREJECTED, which is ambiguous and cannot be distinguished
from other PAWS verification processes.

We added a new counter, like the existing PAWS_OLD_ACK one.

Also we update the doc with previously missing PAWS_OLD_ACK.

usage:
'''
nstat -az | grep PAWSTimewait
TcpExtPAWSTimewait              1                  0.0
'''

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250409112614.16153-3-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-10 18:29:26 -07:00
Paul Chaignon 5a15a050df bpf: Clarify the meaning of BPF_F_PSEUDO_HDR
In the bpf_l4_csum_replace helper, the BPF_F_PSEUDO_HDR flag should only
be set if the modified header field is part of the pseudo-header.

If you modify for example the UDP ports and pass BPF_F_PSEUDO_HDR,
inet_proto_csum_replace4 will update skb->csum even though it shouldn't
(the port and the UDP checksum updates null each other).

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/5126ef84ba75425b689482cbc98bffe75e5d8ab0.1744102490.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09 20:07:31 -07:00
Paul Chaignon b412fd6bcc bpf: Clarify role of BPF_F_RECOMPUTE_CSUM
BPF_F_RECOMPUTE_CSUM doesn't update the actual L3 and L4 checksums in
the packet, but simply updates skb->csum (according to skb->ip_summed).
This patch clarifies that to avoid confusions.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/ff6895d42936f03dbb82334d8bcfd50e00c79086.1744102490.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09 20:07:31 -07:00
Eric Biggers c07d3aede2 fscrypt: add support for hardware-wrapped keys
Add support for hardware-wrapped keys to fscrypt.  Such keys are
protected from certain attacks, such as cold boot attacks.  For more
information, see the "Hardware-wrapped keys" section of
Documentation/block/inline-encryption.rst.

To support hardware-wrapped keys in fscrypt, we allow the fscrypt master
keys to be hardware-wrapped.  File contents encryption is done by
passing the wrapped key to the inline encryption hardware via
blk-crypto.  Other fscrypt operations such as filenames encryption
continue to be done by the kernel, using the "software secret" which the
hardware derives.  For more information, see the documentation which
this patch adds to Documentation/filesystems/fscrypt.rst.

Note that this feature doesn't require any filesystem-specific changes.
However it does depend on inline encryption support, and thus currently
it is only applicable to ext4 and f2fs.

The version of this feature introduced by this patch is mostly
equivalent to the version that has existed downstream in the Android
Common Kernels since 2020.  However, a couple fixes are included.
First, the flags field in struct fscrypt_add_key_arg is now placed in
the proper location.  Second, key identifiers for HW-wrapped keys are
now derived using a distinct HKDF context byte; this fixes a bug where a
raw key could have the same identifier as a HW-wrapped key.  Note that
as a result of these fixes, the version of this feature introduced by
this patch is not UAPI or on-disk format compatible with the version in
the Android Common Kernels, though the divergence is limited to just
those specific fixes.  This version should be used going forwards.

This patch has been heavily rewritten from the original version by
Gaurav Kashyap <quic_gaurkash@quicinc.com> and
Barani Muthukumaran <bmuthuku@codeaurora.org>.

Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sm8650
Link: https://lore.kernel.org/r/20250404225859.172344-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-04-08 19:32:11 -07:00
Jonas Karlman dcbe2aeda2 media: v4l2: Add NV15 and NV20 pixel formats
Add NV15 and NV20 pixel formats used by the Rockchip Video Decoder for
10-bit buffers.

NV15 and NV20 is 10-bit 4:2:0/4:2:2 semi-planar YUV formats similar to
NV12 and NV16, using 10-bit components with no padding between each
component. Instead, a group of 4 luminance/chrominance samples are
stored over 5 bytes in little endian order:

YYYY = UVUV = 4 * 10 bits = 40 bits = 5 bytes

The '15' and '20' suffix refers to the optimum effective bits per pixel
which is achieved when the total number of luminance samples is a
multiple of 8 for NV15 and 4 for NV20.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Tested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Tested-by: Christopher Obbard <chris.obbard@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-04-08 07:21:21 +00:00
Thomas Zimmermann 1afba39f93 Merge drm/drm-next into drm-misc-next
Backmerging to get v6.15-rc1 into drm-misc-next. Also fixes a
build issue when enabling CONFIG_DRM_SCHED_KUNIT_TEST.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-04-07 14:35:48 +02:00
Paolo Bonzini fd02aa45bd Merge branch 'kvm-tdx-initial' into HEAD
This large commit contains the initial support for TDX in KVM.  All x86
parts enable the host-side hypercalls that KVM uses to talk to the TDX
module, a software component that runs in a special CPU mode called SEAM
(Secure Arbitration Mode).

The series is in turn split into multiple sub-series, each with a separate
merge commit:

- Initialization: basic setup for using the TDX module from KVM, plus
  ioctls to create TDX VMs and vCPUs.

- MMU: in TDX, private and shared halves of the address space are mapped by
  different EPT roots, and the private half is managed by the TDX module.
  Using the support that was added to the generic MMU code in 6.14,
  add support for TDX's secure page tables to the Intel side of KVM.
  Generic KVM code takes care of maintaining a mirror of the secure page
  tables so that they can be queried efficiently, and ensuring that changes
  are applied to both the mirror and the secure EPT.

- vCPU enter/exit: implement the callbacks that handle the entry of a TDX
  vCPU (via the SEAMCALL TDH.VP.ENTER) and the corresponding save/restore
  of host state.

- Userspace exits: introduce support for guest TDVMCALLs that KVM forwards to
  userspace.  These correspond to the usual KVM_EXIT_* "heavyweight vmexits"
  but are triggered through a different mechanism, similar to VMGEXIT for
  SEV-ES and SEV-SNP.

- Interrupt handling: support for virtual interrupt injection as well as
  handling VM-Exits that are caused by vectored events.  Exclusive to
  TDX are machine-check SMIs, which the kernel already knows how to
  handle through the kernel machine check handler (commit 7911f145de,
  "x86/mce: Implement recovery for errors in TDX/SEAM non-root mode")

- Loose ends: handling of the remaining exits from the TDX module, including
  EPT violation/misconfig and several TDVMCALL leaves that are handled in
  the kernel (CPUID, HLT, RDMSR/WRMSR, GetTdVmCallInfo); plus returning
  an error or ignoring operations that are not supported by TDX guests

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-07 07:36:33 -04:00
Nas Chung f81f69a0e3 media: uapi: v4l: Fix V4L2_TYPE_IS_OUTPUT condition
V4L2_TYPE_IS_OUTPUT() returns true for V4L2_BUF_TYPE_VIDEO_OVERLAY
which definitely belongs to CAPTURE.

Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-04-07 13:28:25 +02:00
Nas Chung ad2698efce media: uapi: v4l: Change V4L2_TYPE_IS_CAPTURE condition
Explicitly compare a buffer type only with valid buffer types,
to avoid matching a buffer type outside of the valid buffer type set.

Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
Reviewed-by: Michael Tretter <m.tretter@pengutronix.de>
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-04-07 13:28:25 +02:00
Anton Protopopov 62aa5790ce bpf: Fix a comment describing bpf_attr
The map_fd field of the bpf_attr union is used in the BPF_MAP_FREEZE
syscall.  Explicitly mention this in the comments.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250331203618.1973691-2-a.s.protopopov@gmail.com
2025-04-04 08:53:24 -07:00
Linus Torvalds 7930edcc3a io_uring-6.15-20250403
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfvB5IQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgnAEACYhVXXsIsoRqzhp2OTDEUPU8OLxe285h1s
 3u+erGnFiWUAxs54fbJjGCzqQSeAIcr2EuAWEyMdhixTwykoAdrlnKZzuH7jnHZ4
 FVkvOVlP6+7lRshJzr5FEsg5I7HxNMvFmnVKHoAsJRVqfiYbrRbIzSQokiWPE3lb
 a6ptr0itZgXjWftUIIioG6JBYuvZLY0OAqRQPvONB784hSwKxaqTBIY4LaWv+TBb
 SIwe26lXONSGtbQ7g/Vpkta6edcymEqiU1V4XPmjlal0DCMmN9jXY6gTBi/8JfwA
 DITSsDzXNvjAJfXclYzPk8sbNHlgZIha0+yqkVTQpxK8M9gahAAI1Z8TzkjXAG/G
 ttsXKq4DfP9tvytsaOc/blO/NSXiPLcPEmr9m1GYFL3c95tpYwhjSEZwcsyXvRbK
 Ooc3+s6ZYBWT8xO0pS1kRArYTILr6MR6fNbWO0H+f5+kNHSjXWl+0PwuYCpI5UW9
 z9BXVNqM6des7XI+7JRuaLhXoRTn6s5Wz8o4IP/xaNsKQLaosiRDjUxoAE/ZXmNx
 IvR+GCJsLU9xTOd97TIHuj+zfgnRbUiheRbCfpUCoTnmzkcWdqJhPCajtIEus5dI
 g5t2R5TnjRQObgOLTD0DVC2kdDjsOx1y0uDxBUNTBdUR+7TINX7jlkKurmSJMvEC
 MZp77CFcDA==
 =kf65
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-6.15-20250403' of git://git.kernel.dk/linux

Pull more io_uring updates from Jens Axboe:
 "Set of fixes/updates for io_uring that should go into this release.

  The ublk bits could've gone via either tree - usually I put them in
  block, but they got a bit mixed this series with the zero-copy
  supported that ended up dipping into both trees.

  This contains:

   - Fix for sendmsg zc, include in pinned pages accounting like we do
     for the other zc types

   - Series for ublk fixing request aborting, doing various little
     cleanups, fixing some zc issues, and adding queue_rqs support

   - Another ublk series doing some code cleanups

   - Series cleaning up the io_uring send path, mostly in preparation
     for registered buffers

   - Series doing little MSG_RING cleanups

   - Fix for the newly added zc rx, fixing len being 0 for the last
     invocation of the callback

   - Add vectored registered buffer support for ublk. With that, then
     ublk also supports this feature in the kernel revision where it
     could generically introduced for rw/net

   - A bunch of selftest additions for ublk. This is the majority of the
     diffstat

   - Silence a KCSAN data race warning for io-wq

   - Various little cleanups and fixes"

* tag 'io_uring-6.15-20250403' of git://git.kernel.dk/linux: (44 commits)
  io_uring: always do atomic put from iowq
  selftests: ublk: enable zero copy for stripe target
  io_uring: support vectored kernel fixed buffer
  block: add for_each_mp_bvec()
  io_uring: add validate_fixed_range() for validate fixed buffer
  selftests: ublk: kublk: fix an error log line
  selftests: ublk: kublk: use ioctl-encoded opcodes
  io_uring/zcrx: return early from io_zcrx_recv_skb if readlen is 0
  io_uring/net: avoid import_ubuf for regvec send
  io_uring/rsrc: check size when importing reg buffer
  io_uring: cleanup {g,s]etsockopt sqe reading
  io_uring: hide caches sqes from drivers
  io_uring: make zcrx depend on CONFIG_IO_URING
  io_uring: add req flag invariant build assertion
  Documentation: ublk: remove dead footnote
  selftests: ublk: specify io_cmd_buf pointer type
  ublk: specify io_cmd_buf pointer type
  io_uring: don't pass ctx to tw add remote helper
  io_uring/msg: initialise msg request opcode
  io_uring/msg: rename io_double_lock_ctx()
  ...
2025-04-03 15:48:58 -07:00
Linus Torvalds a1b5bd45d4 USB/Thunderbolt update for 6.15-rc1
Here is the big set of USB and Thunderbolt driver updates for 6.15-rc1.
 Included in here are:
   - Thunderbolt driver and core api updates for new hardware and
     features
   - usb-storage const array cleanups
   - typec driver updates
   - dwc3 driver updates
   - xhci driver updates and bugfixes
   - small USB documentation updates
   - usb cdns3 driver updates
   - usb gadget driver updates
   - other small driver updates and fixes
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ+2Zaw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynIgQCaAyMozdrZtTiOs1OcZEuTkoCtKrEAniqe0OiL
 s7R6j2NoOIwo9d6hBsjh
 =IH7I
 -----END PGP SIGNATURE-----

Merge tag 'usb-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB / Thunderbolt updates from Greg KH:
 "Here is the big set of USB and Thunderbolt driver updates for
  6.15-rc1. Included in here are:

   - Thunderbolt driver and core api updates for new hardware and
     features

   - usb-storage const array cleanups

   - typec driver updates

   - dwc3 driver updates

   - xhci driver updates and bugfixes

   - small USB documentation updates

   - usb cdns3 driver updates

   - usb gadget driver updates

   - other small driver updates and fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (92 commits)
  thunderbolt: Do not add non-active NVM if NVM upgrade is disabled for retimer
  thunderbolt: Scan retimers after device router has been enumerated
  usb: host: cdns3: forward lost power information to xhci
  usb: host: xhci-plat: allow upper layers to signal power loss
  usb: xhci: change xhci_resume() parameters to explicit the desired info
  usb: cdns3-ti: run HW init at resume() if HW was reset
  usb: cdns3-ti: move reg writes to separate function
  usb: cdns3: call cdns_power_is_lost() only once in cdns_resume()
  usb: cdns3: rename hibernated argument of role->resume() to lost_power
  usb: xhci: tegra: rename `runtime` boolean to `is_auto_runtime`
  usb: host: xhci-plat: mvebu: use ->quirks instead of ->init_quirk() func
  usb: dwc3: Don't use %pK through printk
  usb: core: Don't use %pK through printk
  usb: gadget: aspeed: Add NULL pointer check in ast_vhub_init_dev()
  dt-bindings: usb: qcom,dwc3: Synchronize minItems for interrupts and -names
  usb: common: usb-conn-gpio: switch psy_cfg from of_node to fwnode
  usb: xhci: Avoid Stop Endpoint retry loop if the endpoint seems Running
  usb: xhci: Don't change the status of stalled TDs on failed Stop EP
  xhci: Avoid queuing redundant Stop Endpoint command for stalled endpoint
  xhci: Handle spurious events on Etron host isoc enpoints
  ...
2025-04-02 18:23:31 -07:00
Linus Torvalds 5e17b5c717 fuse update for 6.15
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZ+vB5QAKCRDh3BK/laaZ
 PGA2AQCVsyLmZFinaNC10S+Bkmx+a7f9MLhX6u+ILbmio8nT1AD7BKCDFD9pucG0
 pilz+OaCXjXt/og6doyugM4SW/Q3tA0=
 =BhLT
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Allow connection to server to time out (Joanne Koong)

 - If server doesn't support creating a hard link, return EPERM rather
   than ENOSYS (Matt Johnston)

 - Allow file names longer than 1024 chars (Bernd Schubert)

 - Fix a possible race if request on io_uring queue is interrupted
   (Bernd Schubert)

 - Misc fixes and cleanups

* tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: remove unneeded atomic set in uring creation
  fuse: fix uring race condition for null dereference of fc
  fuse: Increase FUSE_NAME_MAX to PATH_MAX
  fuse: Allocate only namelen buf memory in fuse_notify_
  fuse: add default_request_timeout and max_request_timeout sysctls
  fuse: add kernel-enforced timeout option for requests
  fuse: optmize missing FUSE_LINK support
  fuse: Return EPERM rather than ENOSYS from link()
  fuse: removed unused function fuse_uring_create() from header
  fuse: {io-uring} Fix a possible req cancellation race
2025-04-02 16:36:59 -07:00
Linus Torvalds 48552153cf iommufd 6.15 merge window pull
Two significant new items:
 
 - Allow reporting IOMMU HW events to userspace when the events are clearly
   linked to a device. This is linked to the VIOMMU object and is intended to
   be used by a VMM to forward HW events to the virtual machine as part of
   emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
   mechanism. Like the existing fault events the data is delivered through
   a simple FD returning event records on read().
 
 - PASID support in VFIO. "Process Address Space ID" is a PCI feature that
   allows the device to tag all PCI DMA operations with an ID. The IOMMU
   will then use the ID to select a unique translation for those DMAs. This
   is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to
   manage each PASID entry. The support is generic so any VFIO user could
   attach any translation to a PASID, and the support should work on ARM
   SMMUv3 as well. AMD requires additional driver work.
 
 Some minor updates, along with fixes:
 
 - Prevent using nested parents with fault's, no driver support today
 
 - Put a single "cookie_type" value in the iommu_domain to indicate what
   owns the various opaque owner fields
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZ+q6NgAKCRCFwuHvBreF
 YZ3zAQDbl4/Z0O+CLN2AXq4Zeiyq1HTSoF94hzqmm7lQ17zTIwD8CCdyLXHvupaq
 tkBIv5IovpaxlrSk6M0kh2K8vPCk9Qk=
 =CIM3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
 "Two significant new items:

   - Allow reporting IOMMU HW events to userspace when the events are
     clearly linked to a device.

     This is linked to the VIOMMU object and is intended to be used by a
     VMM to forward HW events to the virtual machine as part of
     emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
     mechanism. Like the existing fault events the data is delivered
     through a simple FD returning event records on read().

   - PASID support in VFIO.

     The "Process Address Space ID" is a PCI feature that allows the
     device to tag all PCI DMA operations with an ID. The IOMMU will
     then use the ID to select a unique translation for those DMAs. This
     is part of Intel's vIOMMU support as VT-D HW requires the
     hypervisor to manage each PASID entry.

     The support is generic so any VFIO user could attach any
     translation to a PASID, and the support should work on ARM SMMUv3
     as well. AMD requires additional driver work.

  Some minor updates, along with fixes:

   - Prevent using nested parents with fault's, no driver support today

   - Put a single "cookie_type" value in the iommu_domain to indicate
     what owns the various opaque owner fields"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits)
  iommufd: Test attach before detaching pasid
  iommufd: Fix iommu_vevent_header tables markup
  iommu: Convert unreachable() to BUG()
  iommufd: Balance veventq->num_events inc/dec
  iommufd: Initialize the flags of vevent in iommufd_viommu_report_event()
  iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO
  iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
  vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
  ida: Add ida_find_first_range()
  iommufd/selftest: Add coverage for iommufd pasid attach/detach
  iommufd/selftest: Add test ops to test pasid attach/detach
  iommufd/selftest: Add a helper to get test device
  iommufd/selftest: Add set_dev_pasid in mock iommu
  iommufd: Allow allocating PASID-compatible domain
  iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  iommufd: Enforce PASID-compatible domain for RID
  iommufd: Support pasid attach/replace
  iommufd: Enforce PASID-compatible domain in PASID path
  iommufd/device: Add pasid_attach array to track per-PASID attach
  ...
2025-04-01 18:03:46 -07:00
Linus Torvalds 25601e8544 Char/Misc/IIO driver updates for 6.15-rc1
Here is the big set of char, misc, iio, and other smaller driver
 subsystems for 6.15-rc1.  Lots of stuff in here, including:
   - loads of IIO changes and driver updates
   - counter driver updates
   - w1 driver updates
   - faux conversions for some drivers that were abusing the platform bus
     interface
   - coresight driver updates
   - rust miscdevice binding updates based on real-world-use
   - other minor driver updates
 
 All of these have been in linux-next with no reported issues for quite a
 while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ+mNdQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylktACfYJix41jCCDbiFjnu7Hz4OIdcrUsAnRyF164M
 1n5MhEhsEmvQj7WBwQLE
 =AmmW
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc / IIO driver updates from Greg KH:
 "Here is the big set of char, misc, iio, and other smaller driver
  subsystems for 6.15-rc1. Lots of stuff in here, including:

   - loads of IIO changes and driver updates

   - counter driver updates

   - w1 driver updates

   - faux conversions for some drivers that were abusing the platform
     bus interface

   - coresight driver updates

   - rust miscdevice binding updates based on real-world-use

   - other minor driver updates

  All of these have been in linux-next with no reported issues for quite
  a while"

* tag 'char-misc-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
  samples: rust_misc_device: fix markup in top-level docs
  Coresight: Fix a NULL vs IS_ERR() bug in probe
  misc: lis3lv02d: convert to use faux_device
  tlclk: convert to use faux_device
  regulator: dummy: convert to use the faux device interface
  bus: mhi: host: Fix race between unprepare and queue_buf
  coresight: configfs: Constify struct config_item_type
  doc: iio: ad7380: describe offload support
  iio: ad7380: add support for SPI offload
  iio: light: Add check for array bounds in veml6075_read_int_time_ms
  iio: adc: ti-ads7924 Drop unnecessary function parameters
  staging: iio: ad9834: Use devm_regulator_get_enable()
  staging: iio: ad9832: Use devm_regulator_get_enable()
  iio: gyro: bmg160_spi: add of_match_table
  dt-bindings: iio: adc: Add i.MX94 and i.MX95 support
  iio: adc: ad7768-1: remove unnecessary locking
  Documentation: ABI: add wideband filter type to sysfs-bus-iio
  iio: adc: ad7768-1: set MOSI idle state to prevent accidental reset
  iio: adc: ad7768-1: Fix conversion result sign
  iio: adc: ad7124: Benefit of dev = indio_dev->dev.parent in ad7124_parse_channel_config()
  ...
2025-04-01 11:26:08 -07:00
Linus Torvalds d6b02199cd - The 7 patch series "powerpc/crash: use generic crashkernel
reservation" from Sourabh Jain changes powerpc's kexec code to use more
   of the generic layers.
 
 - The 2 patch series "get_maintainer: report subsystem status
   separately" from Vlastimil Babka makes some long-requested improvements
   to the get_maintainer output.
 
 - The 4 patch series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the ucount
   code.
 
 - The 12 patch series "reboot: support runtime configuration of
   emergency hw_protection action" from Ahmad Fatoum improves the ability
   for a driver to perform an emergency system shutdown or reboot.
 
 - The 16 patch series "Converge on using secs_to_jiffies() part two"
   from Easwar Hariharan performs further migrations from
   msecs_to_jiffies() to secs_to_jiffies().
 
 - The 7 patch series "lib/interval_tree: add some test cases and
   cleanup" from Wei Yang permits more userspace testing of kernel library
   code, adds some more tests and performs some cleanups.
 
 - The 2 patch series "hung_task: Dump the blocking task stacktrace" from
   Masami Hiramatsu arranges for the hung_task detector to dump the stack
   of the blocking task and not just that of the blocked task.
 
 - The 4 patch series "resource: Split and use DEFINE_RES*() macros" from
   Andy Shevchenko provides some cleanups to the resource definition
   macros.
 
 - Plus the usual shower of singleton patches - please see the individual
   changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nuqwAKCRDdBJ7gKXxA
 jtNqAQDxqJpjWkzn4yN9CNSs1ivVx3fr6SqazlYCrt3u89WQvwEA1oRrGpETzUGq
 r6khQUIcQImPPcjFqEFpuiSOU0MBZA0=
 =Kii8
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
2025-04-01 10:06:52 -07:00
Linus Torvalds b6dde1e527 NFSD 6.15 Release Notes
Neil Brown contributed more scalability improvements to NFSD's
 open file cache, and Jeff Layton contributed a menagerie of
 repairs to NFSD's NFSv4 callback / backchannel implementation.
 
 Mike Snitzer contributed a change to NFS re-export support that
 disables support for file locking on a re-exported NFSv4 mount.
 This is because NFSv4 state recovery is currently difficult if
 not impossible for re-exported NFS mounts. The change aims to
 prevent data integrity exposures after the re-export server
 crashes.
 
 Work continues on the evolving NFSD netlink administrative API.
 
 Many thanks to the contributors, reviewers, testers, and bug
 reporters who participated during the v6.15 development cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmfmpMIACgkQM2qzM29m
 f5f6DA/+P0YqoRg3Zk/4oWwXZWbfEOMhWFltT+D1PE2QjUfOZpiwUSFQfsfYgXO6
 OFu0iDQ4g8BxBeP6Umv61qy7Cv6n4fVzIHqzymXQvymh9JzoQiXlE9/fA8nAHuiH
 u7kkNPRi7faBz1sMg/WpN9CHctg7STPOhhG/JrZcSFZnh87mU1i4i4bZBNz8tVnK
 ZWf483OUuSmJY2/bUTkwvr4GbceTKBlLWFFjiRhfAKvJBWvu4myfC0DI5QzxmsgI
 MJ62do7AFJP1ww2Ih9LLi2kFIt/yyInSVAgyts1CPhlJ4BfPnTSOw/i2+CuF3D/M
 bZYEAOjH3AqjBZmq58sIQezpD5f9/TOrTSwYwS31zl/THYE413WiW80/MDoWqo0y
 9cSNkD3nJlPVLLCfF58vXLoe7wpLoN/ZbTdxoozzUWEFR5A4Jz3XP8F/Cws0cjem
 uWWAQMItiQpg1+RYJYfu4dg5+iN6dbgYbvzlr7buISwFNXi3Zo99MkJ4wHj9TJbL
 Tpjth1rWGPwwSOMT6ojKiYMq1oUzx5PuAm9Saq9oIzQAbBySmxHF/LSDz3wEuBoO
 MK1jzKroEmMk3fJOOAajSDLOdAbL3vfj6H/xi2IHvKnaz9yHCZNu2YGV05BBMprd
 hWePf69AO5Ky5Q9KuGClEtwvJ9ZR5pb4DO2dqaYu8ximu3O4vPo=
 =e2E2
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Neil Brown contributed more scalability improvements to NFSD's open
  file cache, and Jeff Layton contributed a menagerie of repairs to
  NFSD's NFSv4 callback / backchannel implementation.

  Mike Snitzer contributed a change to NFS re-export support that
  disables support for file locking on a re-exported NFSv4 mount. This
  is because NFSv4 state recovery is currently difficult if not
  impossible for re-exported NFS mounts. The change aims to prevent data
  integrity exposures after the re-export server crashes.

  Work continues on the evolving NFSD netlink administrative API.

  Many thanks to the contributors, reviewers, testers, and bug reporters
  who participated during the v6.15 development cycle"

* tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits)
  NFSD: Add a Kconfig setting to enable delegated timestamps
  sysctl: Fixes nsm_local_state bounds
  nfsd: use a long for the count in nfsd4_state_shrinker_count()
  nfsd: remove obsolete comment from nfs4_alloc_stid
  nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault()
  nfsd: reorganize struct nfs4_delegation for better packing
  nfsd: handle errors from rpc_call_async()
  nfsd: move cb_need_restart flag into cb_flags
  nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING
  nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY
  nfsd: prevent callback tasks running concurrently
  nfsd: disallow file locking and delegations for NFSv4 reexport
  nfsd: filecache: drop the list_lru lock during lock gc scans
  nfsd: filecache: don't repeatedly add/remove files on the lru list
  nfsd: filecache: introduce NFSD_FILE_RECENT
  nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc()
  nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync()
  NFSD: Re-organize nfsd_file_gc_worker()
  nfsd: filecache: remove race handling.
  fs: nfs: acl: Avoid -Wflex-array-member-not-at-end warning
  ...
2025-03-31 17:28:17 -07:00
Joanne Koong 0f6439f61a fuse: add kernel-enforced timeout option for requests
There are situations where fuse servers can become unresponsive or
stuck, for example if the server is deadlocked. Currently, there's no
good way to detect if a server is stuck and needs to be killed manually.

This commit adds an option for enforcing a timeout (in seconds) for
requests where if the timeout elapses without the server responding to
the request, the connection will be automatically aborted.

Please note that these timeouts are not 100% precise. For example, the
request may take roughly an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond
the requested timeout due to internal implementation, in order to
mitigate overhead.

[SzM: Bump the API version number]

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31 14:59:25 +02:00
Linus Torvalds fa593d0f96 bpf-next-6.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfi6ZAACgkQ6rmadz2v
 bTpLOg/+J7xUddPMhlpFAUlifQEadE5hmw6v1tXpM3zyKHzUWJiv/qsx3j8/ckgD
 D+d4P8bqIbI9SSuIS4oZ0+D9pr/g7GYztnoYZmPiYJ7v2AijPuof5dsagFQE8E2y
 rhfbt9KHTMzzkdkTvaAZaITS/HWAoJ2YVRB6gfLex2ghcXYHcgmtKRZniQrbBiFZ
 MIXBN8Rg6HP+pUdIVllSXFcQCb3XIgjPONRAos4hr5tIm+3Ku7Jvkgk2H/9vUcoF
 bdXAcg8xygyH7eY+1l3e7nEPQlG0jUZEsL+tq+vpdoLRLqlIpAUYmwUvqcmq4dPS
 QGFjiUcpDbXlxsUFpzjXHIFto7fXCfND7HEICQPwAncdflIIfYaATSQUfkEexn0a
 wBCFlAChrEzAmg2vFl4EeEr0fdSe/3jswrgKx0m6ctKieMjgloBUeeH4fXOpfkhS
 9tvhuduVFuronlebM8ew4w9T/mBgbyxkE5KkvP4hNeB3ni3N0K6Mary5/u2HyN1e
 lqTlnZxRA4p6lrvxce/mDrR4VSwlKLcSeQVjxAL1afD5KRkuZJnUv7bUhS361vkG
 IjNrQX30EisDAz+X7tMn3ndBf9vVatwFT4+c3yaxlQRor1WofhDfT88HPiyB4QqQ
 Kdx2EHgbQxJp4vkzhp4/OXlTfkihsMEn8egzZuphdPEQ9Y+Jdwg=
 =aN/V
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:
 "For this merge window we're splitting BPF pull request into three for
  higher visibility: main changes, res_spin_lock, try_alloc_pages.

  These are the main BPF changes:

   - Add DFA-based live registers analysis to improve verification of
     programs with loops (Eduard Zingerman)

   - Introduce load_acquire and store_release BPF instructions and add
     x86, arm64 JIT support (Peilin Ye)

   - Fix loop detection logic in the verifier (Eduard Zingerman)

   - Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet)

   - Add kfunc for populating cpumask bits (Emil Tsalapatis)

   - Convert various shell based tests to selftests/bpf/test_progs
     format (Bastien Curutchet)

   - Allow passing referenced kptrs into struct_ops callbacks (Amery
     Hung)

   - Add a flag to LSM bpf hook to facilitate bpf program signing
     (Blaise Boscaccy)

   - Track arena arguments in kfuncs (Ihor Solodrai)

   - Add copy_remote_vm_str() helper for reading strings from remote VM
     and bpf_copy_from_user_task_str() kfunc (Jordan Rome)

   - Add support for timed may_goto instruction (Kumar Kartikeya
     Dwivedi)

   - Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy)

   - Reduce bpf_cgrp_storage_busy false positives when accessing cgroup
     local storage (Martin KaFai Lau)

   - Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko)

   - Allow retrieving BTF data with BTF token (Mykyta Yatsenko)

   - Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix
     (Song Liu)

   - Reject attaching programs to noreturn functions (Yafang Shao)

   - Introduce pre-order traversal of cgroup bpf programs (Yonghong
     Song)"

* tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits)
  selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid
  bpf: Fix out-of-bounds read in check_atomic_load/store()
  libbpf: Add namespace for errstr making it libbpf_errstr
  bpf: Add struct_ops context information to struct bpf_prog_aux
  selftests/bpf: Sanitize pointer prior fclose()
  selftests/bpf: Migrate test_xdp_vlan.sh into test_progs
  selftests/bpf: test_xdp_vlan: Rename BPF sections
  bpf: clarify a misleading verifier error message
  selftests/bpf: Add selftest for attaching fexit to __noreturn functions
  bpf: Reject attaching fexit/fmod_ret to __noreturn functions
  bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage
  bpf: Make perf_event_read_output accessible in all program types.
  bpftool: Using the right format specifiers
  bpftool: Add -Wformat-signedness flag to detect format errors
  selftests/bpf: Test freplace from user namespace
  libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID
  bpf: Return prog btf_id without capable check
  bpf: BPF token support for BPF_BTF_GET_FD_BY_ID
  bpf, x86: Fix objtool warning for timed may_goto
  bpf: Check map->record at the beginning of check_and_free_fields()
  ...
2025-03-30 12:43:03 -07:00
Linus Torvalds f90f2145b2 s390 updates for 6.15 merge window
- Add sorting of mcount locations at build time
 
 - Rework uaccess functions with C exception handling to shorten inline
   assembly size and enable full inlining. This yields near-optimal code
   for small constant copies with a ~40kb kernel size increase
 
 - Add support for a configurable STRICT_MM_TYPECHECKS which allows to
   generate better code, but also allows to have type checking for
   debug builds
 
 - Optimize get_lowcore() for common callers with alternatives that
   nearly revert to the pre-relocated lowcore code, while also slightly
   reducing syscall entry and exit time
 
 - Convert MACHINE_HAS_* checks for single facility tests into cpu_has_*
   style macros that call test_facility(), and for features with additional
   conditions, add a new ALT_TYPE_FEATURE alternative to provide a static
   branch via alternative patching. Also, move machine feature detection
   to the decompressor for early patching and add debugging functionality
   to easily show which alternatives are patched
 
 - Add exception table support to early boot / startup code to get rid
   of the open coded exception handling
 
 - Use asm_inline for all inline assemblies with EX_TABLE or ALTERNATIVE
   to ensure correct inlining and unrolling decisions
 
 - Remove 2k page table leftovers now that s390 has been switched to
   always allocate 4k page tables
 
 - Split kfence pool into 4k mappings in arch_kfence_init_pool() and
   remove the architecture-specific kfence_split_mapping()
 
 - Use READ_ONCE_NOCHECK() in regs_get_kernel_stack_nth() to silence
   spurious KASAN warnings from opportunistic ftrace argument tracing
 
 - Force __atomic_add_const() variants on s390 to always return void,
   ensuring compile errors for improper usage
 
 - Remove s390's ioremap_wt() and pgprot_writethrough() due to mismatched
   semantics and lack of known users, relying on asm-generic fallbacks
 
 - Signal eventfd in vfio-ap to notify userspace when the guest AP
   configuration changes, including during mdev removal
 
 - Convert mdev_types from an array to a pointer in vfio-ccw and vfio-ap
   drivers to avoid fake flex array confusion
 
 - Cleanup trap code
 
 - Remove references to the outdated linux390@de.ibm.com address
 
 - Other various small fixes and improvements all over the code
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmfmuPwACgkQjYWKoQLX
 FBgTDAgAjKmZ5OYjACRfYepTvKk9SDqa2CBlQZ+BhbAXEVIrxKnv8OkImAXoWNsM
 mFxiCxAHWdcD+nqTrxFsXhkNLsndijlwnj/IqZgvy6R/3yNtBlAYRPLujOmVrsQB
 dWB8Dl38p63Ip1JfAqyabiAOUjfhrclRcM5FX5tgciXA6N/vhY3OM6k0+k7wN4Nj
 Dei/rCrnYRXTrFQgtM4w8JTIrwdnXjeKvaTYCflh4Q5ISJ7TceSF7cqq8HOs5hhK
 o2ciaoTdx212522CIsxeN3Ls3jrn8bCOCoOeSCysc5RL84grAuFnmjSajo1LFide
 S/TQtHXYy78Wuei9xvHi561ogiv/ww==
 =Kxgc
 -----END PGP SIGNATURE-----

Merge tag 's390-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Add sorting of mcount locations at build time

 - Rework uaccess functions with C exception handling to shorten inline
   assembly size and enable full inlining. This yields near-optimal code
   for small constant copies with a ~40kb kernel size increase

 - Add support for a configurable STRICT_MM_TYPECHECKS which allows to
   generate better code, but also allows to have type checking for debug
   builds

 - Optimize get_lowcore() for common callers with alternatives that
   nearly revert to the pre-relocated lowcore code, while also slightly
   reducing syscall entry and exit time

 - Convert MACHINE_HAS_* checks for single facility tests into cpu_has_*
   style macros that call test_facility(), and for features with
   additional conditions, add a new ALT_TYPE_FEATURE alternative to
   provide a static branch via alternative patching. Also, move machine
   feature detection to the decompressor for early patching and add
   debugging functionality to easily show which alternatives are patched

 - Add exception table support to early boot / startup code to get rid
   of the open coded exception handling

 - Use asm_inline for all inline assemblies with EX_TABLE or ALTERNATIVE
   to ensure correct inlining and unrolling decisions

 - Remove 2k page table leftovers now that s390 has been switched to
   always allocate 4k page tables

 - Split kfence pool into 4k mappings in arch_kfence_init_pool() and
   remove the architecture-specific kfence_split_mapping()

 - Use READ_ONCE_NOCHECK() in regs_get_kernel_stack_nth() to silence
   spurious KASAN warnings from opportunistic ftrace argument tracing

 - Force __atomic_add_const() variants on s390 to always return void,
   ensuring compile errors for improper usage

 - Remove s390's ioremap_wt() and pgprot_writethrough() due to
   mismatched semantics and lack of known users, relying on asm-generic
   fallbacks

 - Signal eventfd in vfio-ap to notify userspace when the guest AP
   configuration changes, including during mdev removal

 - Convert mdev_types from an array to a pointer in vfio-ccw and vfio-ap
   drivers to avoid fake flex array confusion

 - Cleanup trap code

 - Remove references to the outdated linux390@de.ibm.com address

 - Other various small fixes and improvements all over the code

* tag 's390-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (78 commits)
  s390: Use inline qualifier for all EX_TABLE and ALTERNATIVE inline assemblies
  s390/kfence: Split kfence pool into 4k mappings in arch_kfence_init_pool()
  s390/ptrace: Avoid KASAN false positives in regs_get_kernel_stack_nth()
  s390/boot: Ignore vmlinux.map
  s390/sysctl: Remove "vm/allocate_pgste" sysctl
  s390: Remove 2k vs 4k page table leftovers
  s390/tlb: Use mm_has_pgste() instead of mm_alloc_pgste()
  s390/lowcore: Use lghi instead llilh to clear register
  s390/syscall: Merge __do_syscall() and do_syscall()
  s390/spinlock: Implement SPINLOCK_LOCKVAL with inline assembly
  s390/smp: Implement raw_smp_processor_id() with inline assembly
  s390/current: Implement current with inline assembly
  s390/lowcore: Use inline qualifier for get_lowcore() inline assembly
  s390: Move s390 sysctls into their own file under arch/s390
  s390/syscall: Simplify syscall_get_arguments()
  s390/vfio-ap: Notify userspace that guest's AP config changed when mdev removed
  s390: Remove ioremap_wt() and pgprot_writethrough()
  s390/mm: Add configurable STRICT_MM_TYPECHECKS
  s390/mm: Convert pgste_val() into function
  s390/mm: Convert pgprot_val() into function
  ...
2025-03-29 11:59:43 -07:00
Linus Torvalds e5e0e6bebe This update includes the following changes:
API:
 
 - Remove legacy compression interface.
 - Improve scatterwalk API.
 - Add request chaining to ahash and acomp.
 - Add virtual address support to ahash and acomp.
 - Add folio support to acomp.
 - Remove NULL dst support from acomp.
 
 Algorithms:
 
 - Library options are fuly hidden (selected by kernel users only).
 - Add Kerberos5 algorithms.
 - Add VAES-based ctr(aes) on x86.
 - Ensure LZO respects output buffer length on compression.
 - Remove obsolete SIMD fallback code path from arm/ghash-ce.
 
 Drivers:
 
 - Add support for PCI device 0x1134 in ccp.
 - Add support for rk3588's standalone TRNG in rockchip.
 - Add Inside Secure SafeXcel EIP-93 crypto engine support in eip93.
 - Fix bugs in tegra uncovered by multi-threaded self-test.
 - Fix corner cases in hisilicon/sec2.
 
 Others:
 
 - Add SG_MITER_LOCAL to sg miter.
 - Convert ubifs, hibernate and xfrm_ipcomp from legacy API to acomp.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmfiQ9kACgkQxycdCkmx
 i6fFZg/9GWjC1FLEV66vNlYAIzFGwzwWdFGyQzXyP235Cphhm4qt9gx7P91N6Lvc
 pplVjNEeZHoP8lMw+AIeGc2cRhIwsvn8C+HA3tCBOoC1qSe8T9t7KHAgiRGd/0iz
 UrzVBFLYlR9i4tc0T5peyQwSctv8DfjWzduTmI3Ts8i7OQcfeVVgj3sGfWam7kjF
 1GJWIQH7aPzT8cwFtk8gAK1insuPPZelT1Ppl9kUeZe0XUibrP7Gb5G9simxXAyi
 B+nLCaJYS6Hc1f47cfR/qyZSeYQN35KTVrEoKb1pTYXfEtMv6W9fIvQVLJRYsqpH
 RUBdDJUseE+WckR6glX9USrh+Fv9d+HfsTXh1fhpApKU5sQJ7pDbUm4ge8p6htNG
 MIszbJPdqajYveRLuPUjFlUXaqomos8eT6BZA+RLHm1cogzEOm+5bjspbfRNAVPj
 x9KiDu5lXNiFj02v/MkLKUe3bnGIyVQnZNi7Rn0Rpxjv95tIjVpksZWMPJarxUC6
 5zdyM2I5X0Z9+teBpbfWyqfzSbAs/KpzV8S/xNvWDUT6NlpYGBeNXrCDTXcwJLAh
 PRW0w1EJUwsZbPi8GEh5jNzo/YK1cGsUKrihKv7YgqSSopMLI8e/WVr8nKZMVDFA
 O+6F6ec5lR7KsOIMGUqrBGFU1ccAeaLLvLK3H5J8//gMMg82Uik=
 =aQNt
 -----END PGP SIGNATURE-----

Merge tag 'v6.15-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Remove legacy compression interface
   - Improve scatterwalk API
   - Add request chaining to ahash and acomp
   - Add virtual address support to ahash and acomp
   - Add folio support to acomp
   - Remove NULL dst support from acomp

  Algorithms:
   - Library options are fuly hidden (selected by kernel users only)
   - Add Kerberos5 algorithms
   - Add VAES-based ctr(aes) on x86
   - Ensure LZO respects output buffer length on compression
   - Remove obsolete SIMD fallback code path from arm/ghash-ce

  Drivers:
   - Add support for PCI device 0x1134 in ccp
   - Add support for rk3588's standalone TRNG in rockchip
   - Add Inside Secure SafeXcel EIP-93 crypto engine support in eip93
   - Fix bugs in tegra uncovered by multi-threaded self-test
   - Fix corner cases in hisilicon/sec2

  Others:
   - Add SG_MITER_LOCAL to sg miter
   - Convert ubifs, hibernate and xfrm_ipcomp from legacy API to acomp"

* tag 'v6.15-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (187 commits)
  crypto: testmgr - Add multibuffer acomp testing
  crypto: acomp - Fix synchronous acomp chaining fallback
  crypto: testmgr - Add multibuffer hash testing
  crypto: hash - Fix synchronous ahash chaining fallback
  crypto: arm/ghash-ce - Remove SIMD fallback code path
  crypto: essiv - Replace memcpy() + NUL-termination with strscpy()
  crypto: api - Call crypto_alg_put in crypto_unregister_alg
  crypto: scompress - Fix incorrect stream freeing
  crypto: lib/chacha - remove unused arch-specific init support
  crypto: remove obsolete 'comp' compression API
  crypto: compress_null - drop obsolete 'comp' implementation
  crypto: cavium/zip - drop obsolete 'comp' implementation
  crypto: zstd - drop obsolete 'comp' implementation
  crypto: lzo - drop obsolete 'comp' implementation
  crypto: lzo-rle - drop obsolete 'comp' implementation
  crypto: lz4hc - drop obsolete 'comp' implementation
  crypto: lz4 - drop obsolete 'comp' implementation
  crypto: deflate - drop obsolete 'comp' implementation
  crypto: 842 - drop obsolete 'comp' implementation
  crypto: nx - Migrate to scomp API
  ...
2025-03-29 10:01:55 -07:00
Linus Torvalds 7d06015d93 pci-v6.15-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmfmt1AUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxqNw//cC1BlVe4UUVR5r9nfpoFeGAZeDJz
 32naGvZKjwL0tR6dStO/BEZx4QrBp+smVfJfuxtQxfzLHLgMigM2jVhfa7XUmaun
 7yZlJZu4Jmydc57sPf56CFOYOFP6zyPzSaE8u1Eb4IIqpvuoYpvDayDt6PSsLmFS
 PDzqmicT3nuNbbcfE4rYLyL6JsXooKCR1h+NNcDjy7Run9DvQbE6N2PPvXCu6O97
 aC3+kYUydEpgn9DfjBDghe+GBQCkBPldwnWqXBxKDFmYj5bKFujNccS9/IDSEuuX
 oWntDRAXgLWg048sBC1AuJQajF3UaqffRGJkzUBaZWbU/jB9t5N/Z3GpYlXzizRx
 CAqnt/ciGUKVbaESKwoeeIqgK+wG1bnrmoEaJHXFGqjr6sjm2A2T5EzyBMJ1hFwE
 wUq6SDnkp5igG7rWtsBPo/lGa5h/pNlaXng11570ikD2ZfHVfRgwy2MpXYxChrkt
 X2q/lRYU7yyfNJQ8O5LQJ6bYztatjxT0TxXNFv+cxfVrdI7vnQMuaeBE352jn+Lo
 aZ8fTJDFbRXtrZolcoetZjBdHwPIS42wYQtvxo/ylUl64xKEzZEzN09XODjwn74v
 nuOzDtbM0TAjZWxi6bwRFPnemTEAQxPuv1i4VdPQ7yblvC0at2agNBsz6Fdq60GS
 s80YMJVUU0UcVoc=
 =gM1i
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Enable Configuration RRS SV, which makes device readiness visible,
     early instead of during child bus scanning (Bjorn Helgaas)

   - Log debug messages about reset methods being used (Bjorn Helgaas)

   - Avoid reset when it has been disabled via sysfs (Nishanth
     Aravamudan)

   - Add common pci-ep-bus.yaml schema for exporting several peripherals
     of a single PCI function via devicetree (Andrea della Porta)

   - Create DT nodes for PCI host bridges to enable loading device tree
     overlays to create platform devices for PCI devices that have
     several features that require multiple drivers (Herve Codina)

  Resource management:

   - Enlarge devres table[] to accommodate bridge windows, ROM, IOV
     BARs, etc., and validate BAR index in devres interfaces (Philipp
     Stanner)

   - Fix typo that repeatedly distributed resources to a bridge instead
     of iterating over subordinate bridges, which resulted in too little
     space to assign some BARs (Kai-Heng Feng)

   - Relax bridge window tail sizing for optional resources, e.g., IOV
     BARs, to avoid failures when removing and re-adding devices (Ilpo
     Järvinen)

   - Allow drivers to enable devices even if we haven't assigned
     optional IOV resources to them (Ilpo Järvinen)

   - Rework handling of optional resources (IOV BARs, ROMs) to reduce
     failures if we can't allocate them (Ilpo Järvinen)

   - Fix a NULL dereference in the SR-IOV VF creation error path (Shay
     Drory)

   - Fix s390 mmio_read/write syscalls, which didn't cause page faults
     in some cases, which broke vfio-pci lazy mapping on first access
     (Niklas Schnelle)

   - Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which
     was disabled only for s390 (Niklas Schnelle)

   - Support mmap of PCI resources on s390 except for ISM devices
     (Niklas Schnelle)

  ASPM:

   - Delay pcie_link_state deallocation to avoid dangling pointers that
     cause invalid references during hot-unplug (Daniel Stodden)

  Power management:

   - Allow PCI bridges to go to D3Hot when suspending on all non-x86
     systems (Manivannan Sadhasivam)

  Power control:

   - Create pwrctrl devices in pci_scan_device() to make it more
     symmetric with pci_pwrctrl_unregister() and make pwrctrl devices
     for PCI bridges possible (Manivannan Sadhasivam)

   - Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc.
     can still access devices after pci_stop_dev() (Manivannan
     Sadhasivam)

   - If there's a pwrctrl device for a PCI device, skip scanning it
     because the pwrctrl core will rescan the bus after the device is
     powered on (Manivannan Sadhasivam)

   - Add a pwrctrl driver for PCI slots based on voltage regulators
     described via devicetree (Manivannan Sadhasivam)

  Bandwidth control:

   - Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
     set_pcie_cooling_state.sh test case (Yi Lai)

   - Avoid a NULL pointer dereference when we run out of bus numbers to
     assign for a bridge secondary bus (Lukas Wunner)

  Hotplug:

   - Drop superfluous pci_hotplug_slot_list, try_module_get() calls, and
     NULL pointer checks (Lukas Wunner)

   - Drop shpchp module init/exit logging, replace shpchp dbg() with
     ctrl_dbg(), and remove unused dbg(), err(), info(), warn() wrappers
     (Ilpo Järvinen)

   - Drop 'shpchp_debug' module parameter in favor of standard dynamic
     debugging (Ilpo Järvinen)

   - Drop unused cpcihp .get_power(), .set_power() function pointers
     (Guilherme Giacomo Simoes)

   - Disable hotplug interrupts in portdrv only when pciehp is not
     enabled to avoid issuing two hotplug commands too close together
     (Feng Tang)

   - Skip pciehp 'device replaced' check if the device has been removed
     to address a deadlock when resuming after a device was removed
     during system sleep (Lukas Wunner)

   - Don't enable pciehp hotplug interupt when resuming in poll mode
     (Ilpo Järvinen)

  Virtualization:

   - Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar
     Dave)

  DOE:

   - Expose supported DOE features via sysfs (Alistair Francis)

   - Allow DOE support to be enabled even if CXL isn't enabled (Alistair
     Francis)

  Endpoint framework:

   - Convert PCI device data so pci-epf-test works correctly on
     big-endian endpoint systems (Niklas Cassel)

   - Add BAR_RESIZABLE type to endpoint framework and add DWC core
     support for EPF drivers to set BAR_RESIZABLE type and size (Niklas
     Cassel)

   - Fix pci-epf-test double free that causes an oops if the host
     reboots and PERST# deassertion restarts endpoint BAR allocation
     (Christian Bruel)

   - Fix endpoint BAR testing so tests can skip disabled BARs instead of
     reporting them as failures (Niklas Cassel)

   - Widen endpoint test BAR size variable to accommodate BARs larger
     than INT_MAX (Niklas Cassel)

   - Remove unused tools 'pci' build target left over after moving tests
     to tools/testing/selftests/pci_endpoint (Jianfeng Liu)

  Altera PCIe controller driver:

   - Add DT binding and driver support for Agilex family (P-Tile,
     F-Tile, R-Tile) (Matthew Gerlach and D M, Sharath Kumar)

  AMD MDB PCIe controller driver:

   - Add DT binding and driver for AMD MDB (Multimedia DMA Bridge)
     (Thippeswamy Havalige)

  Broadcom STB PCIe controller driver:

   - Add BCM2712 MSI-X DT binding and interrupt controller drivers and
     add softdep on irq_bcm2712_mip driver to ensure that it is loaded
     first (Stanimir Varbanov)

   - Expand inbound window map to 64GB so it can accommodate BCM2712
     (Stanimir Varbanov)

   - Add BCM2712 support and DT updates (Stanimir Varbanov)

   - Apply link speed restriction before bringing link up, not after
     (Jim Quinlan)

   - Update Max Link Speed in Link Capabilities via the internal
     writable register, not the read-only config register (Jim Quinlan)

   - Handle regulator_bulk_get() error to avoid panic when we call
     regulator_bulk_free() later (Jim Quinlan)

   - Disable regulators only when removing the bus immediately below a
     Root Port because we don't support regulators deeper in the
     hierarchy (Jim Quinlan)

   - Make const read-only arrays static (Colin Ian King)

  Cadence PCIe endpoint driver:

   - Correct MSG TLP generation so endpoints can generate INTx messages
     (Hans Zhang)

  Freescale i.MX6 PCIe controller driver:

   - Identify the second controller on i.MX8MQ based on devicetree
     'linux,pci-domain' instead of DBI 'reg' address (Richard Zhu)

   - Remove imx_pcie_cpu_addr_fixup() since dwc core can now derive the
     ATU input address (using parent_bus_offset) from devicetree (Frank
     Li)

  Freescale Layerscape PCIe controller driver:

   - Drop deprecated 'num-ib-windows' and 'num-ob-windows' and
     unnecessary 'status' from example (Krzysztof Kozlowski)

   - Correct the syscon_regmap_lookup_by_phandle_args("fsl,pcie-scfg")
     arg_count to fix probe failure on LS1043A (Ioana Ciornei)

  HiSilicon STB PCIe controller driver:

   - Call phy_exit() to clean up if histb_pcie_probe() fails (Christophe
     JAILLET)

  Intel Gateway PCIe controller driver:

   - Remove intel_pcie_cpu_addr() since dwc core can now derive the ATU
     input address (using parent_bus_offset) from devicetree (Frank Li)

  Intel VMD host bridge driver:

   - Convert vmd_dev.cfg_lock from spinlock_t to raw_spinlock_t so
     pci_ops.read() will never sleep, even on PREEMPT_RT where
     spinlock_t becomes a sleepable lock, to avoid calling a sleeping
     function from invalid context (Ryo Takakura)

  MediaTek PCIe Gen3 controller driver:

   - Remove leftover mac_reset assert for Airoha EN7581 SoC (Lorenzo
     Bianconi)

   - Add EN7581 PBUS controller 'mediatek,pbus-csr' DT property and
     program host bridge memory aperture to this syscon node (Lorenzo
     Bianconi)

  Qualcomm PCIe controller driver:

   - Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan)

   - Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander
     Stein)

   - Add optional dma-coherent DT property for Qualcomm SA8775P (Dmitry
     Baryshkov)

   - Make DT iommu property required for SA8775P and prohibited for
     SDX55 (Dmitry Baryshkov)

   - Add DT IOMMU and DMA-related properties for Qualcomm SM8450 (Dmitry
     Baryshkov)

   - Add endpoint DT properties for SAR2130P and enable endpoint mode in
     driver (Dmitry Baryshkov)

   - Describe endpoint BAR0 and BAR2 as 64-bit only and BAR1 and BAR3 as
     RESERVED (Manivannan Sadhasivam)

  Rockchip DesignWare PCIe controller driver:

   - Describe rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas
     Cassel)

  Synopsys DesignWare PCIe controller driver:

   - Add debugfs-based Silicon Debug, Error Injection, Statistical
     Counter support for DWC (Shradha Todi)

   - Add debugfs property to expose LTSSM status of DWC PCIe link (Hans
     Zhang)

   - Add Rockchip support for DWC debugfs features (Niklas Cassel)

   - Add dw_pcie_parent_bus_offset() to look up the parent bus address
     of a specified 'reg' property and return the offset from the CPU
     physical address (Frank Li)

   - Use dw_pcie_parent_bus_offset() to derive CPU -> ATU addr offset
     via 'reg[config]' for host controllers and 'reg[addr_space]' for
     endpoint controllers (Frank Li)

   - Apply struct dw_pcie.parent_bus_offset in ATU users to remove use
     of .cpu_addr_fixup() when programming ATU (Frank Li)

  TI J721E PCIe driver:

   - Correct the 'link down' interrupt bit for J784S4 (Siddharth
     Vadapalli)

  TI Keystone PCIe controller driver:

   - Describe AM65x BARs 2 and 5 as Resizable (not Fixed) and reduce
     alignment requirement from 1MB to 64KB (Niklas Cassel)

  Xilinx Versal CPM PCIe controller driver:

   - Free IRQ domain in probe error path to avoid leaking it
     (Thippeswamy Havalige)

   - Add DT .compatible "xlnx,versal-cpm5nc-host" and driver support for
     Versal Net CPM5NC Root Port controller (Thippeswamy Havalige)

   - Add driver support for CPM5_HOST1 (Thippeswamy Havalige)

  Miscellaneous:

   - Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer)

   - Use for_each_available_child_of_node_scoped() to simplify apple,
     kirin, mediatek, mt7621, tegra drivers (Zhang Zekun)"

* tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (197 commits)
  PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args()
  PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
  misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
  PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
  PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
  PCI: endpoint: Add intx_capable to epc_features struct
  dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
  PCI: intel-gw: Remove intel_pcie_cpu_addr()
  PCI: imx6: Remove imx_pcie_cpu_addr_fixup()
  PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()
  PCI: dwc: ep: Ensure proper iteration over outbound map windows
  PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset
  PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()
  PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()
  PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset
  PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug
  PCI: dwc: Add dw_pcie_parent_bus_offset()
  PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
  PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant
  PCI: brcmstb: Make const read-only arrays static
  ...
2025-03-28 19:36:53 -07:00
Linus Torvalds 0c86b42439 drm for 6.15-rc1
uapi:
 - add mediatek tiled fourcc
 - add support for notifying userspace on device wedged
 
 new driver:
 - appletbdrm: support for Apple Touchbar displays on m1/m2
 - nova-core: skeleton rust driver to develop nova inside off
 
 firmware:
 - add some rust firmware pieces
 
 rust:
 - add 'LocalModule' type alias
 
 component:
 - add helper to query bound status
 
 fbdev:
 - fbtft: remove access to page->index
 
 media:
 - cec: tda998x: import driver from drm
 
 dma-buf:
 - add fast path for single fence merging
 
 tests:
 - fix lockdep warnings
 
 atomic:
 - allow full modeset on connector changes
 - clarify semantics of allow_modeset and drm_atomic_helper_check
 - async-flip: support on arbitary planes
 - writeback: fix UAF
 - Document atomic-state history
 
 format-helper:
 - support ARGB8888 to ARGB4444 conversions
 
 buddy:
 - fix multi-root cleanup
 
 ci:
 - update IGT
 
 dp:
 - support extended wake timeout
 - mst: fix RAD to string conversion
 - increase DPCD eDP control CAP size to 5 bytes
 - add DPCD eDP v1.5 definition
 - add helpers for LTTPR transparent mode
 
 panic:
 - encode QR code according to Fido 2.2
 
 scheduler:
 - add parameter struct for init
 - improve job peek/pop operations
 - optimise drm_sched_job struct layout
 
 ttm:
 - refactor pool allocation
 - add helpers for TTM shrinker
 
 panel-orientation:
 - add a bunch of new quirks
 
 panel:
 - convert panels to multi-style functions
 - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3,
   LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006,
   Lenovo T14s Gen6 Snapdragon
 - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay
   kd110n11-51ie, Starry 2082109qfh040022-50e
 - visionox-r66451: use multi-style MIPI-DSI functions
 - raydium-rm67200: Add driver for Raydium RM67200
 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17
 - sony-td4353-jdi: Use MIPI-DSI multi-func interface
 - summit: Add driver for Apple Summit display panel
 - visionox-rm692e5: Add driver for Visionox RM692E5
 
 bridge:
 - pass full atomic state to various callbacks
 - adv7511: Report correct capabilities
 - it6505: Fix HDCP V compare
 - snd65dsi86: fix device IDs
 - nwl-dsi: set bridge type
 - ti-sn65si83: add error recovery and set bridge type
 - synopsys: add HDMI audio support
 
 xe:
 - support device-wedged event
 - add mmap support for PCI memory barrier
 - perf pmu integration and expose per-engien activity
 - add EU stall sampling support
 - GPU SVM and Xe SVM implementation
 - use TTM shrinker
 - add survivability mode to allow the driver to do
   firmware updates in critical failure states
 - PXP HWDRM support for MTL and LNL
 - expose package/vram temps over hwmon
 - enable DP tunneling
 - drop mmio_ext abstraction
 - Reject BO evcition if BO is bound to current VM
 - Xe suballocator improvements
 - re-use display vmas when possible
 - add GuC Buffer Cache abstraction
 - PCI ID update for Panther Lake and Battlemage
 - Enable SRIOV for Panther Lake
 - Refactor VRAM manager location
 
 i915:
 - enable extends wake timeout
 - support device-wedged event
 - Enable DP 128b/132b SST DSC
 - FBC dirty rectangle support for display version 30+
 - convert i915/xe to drm client setup
 - Compute HDMI PLLS for rates not in fixed tables
 - Allow DSB usage when PSR is enabled on LNL+
 - Enable panel replay without full modeset
 - Enable async flips with compressed buffers on ICL+
 - support luminance based brightness via DPCD for eDP
 - enable VRR enable/disable without full modeset
 - allow GuC SLPC default strategies on MTL+ for performance
 - lots of display refactoring in move to struct intel_display
 
 amdgpu:
 - add device wedged event
 - support async page flips on overlay planes
 - enable broadcast RGB drm property
 - add info ioctl for virt mode
 - OEM i2c support for RGB lights
 - GC 11.5.2 + 11.5.3 support
 - SDMA 6.1.3 support
 - NBIO 7.9.1 + 7.11.2 support
 - MMHUB 1.8.1 + 3.3.2 support
 - DCN 3.6.0 support
 - Add dynamic workload profile switching for GC 10-12
 - support larger VBIOS sizes
 - Mark gttsize parameters as deprecated
 - Initial JPEG queue resset support
 
 amdkfd:
 - add KFD per process flags for setting precision
 - sync pasid values between KGD and KFD
 - improve GTT/VRAM handling for APUs
 - fix user queue validation on GC7/8
 - SDMA queue reset support
 
 raedeon:
 - rs400 hyperz fix
 
 i2c:
 - td998x: drop platform_data, split driver into media and bridge
 
 ast:
 - transmitter chip detection refactoring
 - vbios display mode refactoring
 - astdp: fix connection status and filter unsupported modes
 - cursor handling refactoring
 
 imagination:
 - check job dependencies with sched helper
 
 ivpu:
 - improve command queue handling
 - use workqueue for IRQ handling
 - add support HW fault injection
 - locking fixes
 
 mgag200:
 - add support for G200eH5
 
 msm:
 - dpu: add concurrent writeback support for DPU 10.x+
 - use LTTPR helpers
 - GPU:
   - Fix obscure GMU suspend failure
   - Expose syncobj timeline support
   - Extend GPU devcoredump with pagetable info
   - a623 support
   - Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot / devcoredump
 - Display:
   - Add cpu-cfg interconnect paths on SM8560 and SM8650
   - Introduce KMS OMMU fault handler, causing devcoredump snapshot
   - Fixed error pointer dereference in msm_kms_init_aspace()
 - DPU:
   - Fix mode_changing handling
   - Add writeback support on SM6150 (QCS615)
   - Fix DSC programming in 1:1:1 topology
   - Reworked hardware resource allocation, moving it to the CRTC code
   - Enabled support for Concurrent WriteBack (CWB) on SM8650
   - Enabled CDM blocks on all relevant platforms
   - Reworked debugfs interface for BW/clocks debugging
   - Clear perf params before calculating bw
   - Support YUV formats on writeback
   - Fixed double inclusion
   - Fixed writeback in YUV formats when using cloned output, Dropped
     wb2_formats_rgb
   - Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt
     kerneldocs
   - Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode()
 - DSI:
   - DSC-related fixes
   - Rework clock programming
 - DSI PHY:
   - Fix 7nm (and lower) PHY programming
   - Add proper DT schema definitions for DSI PHY clocks
 - HDMI:
   - Rework the driver, enabling the use of the HDMI Connector framework
 - Bindings:
   - Added eDP PHY on SA8775P
 
 nouveau:
 - move drm_slave_encoder interface into driver
 - nvkm: refactor GSP RPC
 - use LTTPR helpers
 
 mediatek:
 - HDMI fixup and refinement
 - add MT8188 dsc compatible
 - MT8365 SoC support
 
 panthor:
 - Expose sizes of intenral BOs via fdinfo
 - Fix race between reset and suspend
 - Improve locking
 
 qaic:
 - Add support for AIC200
 
 renesas:
 - Fix limits in DT bindings
 
 rockchip:
 - support rk3562-mali
 - rk3576: Add HDMI support
 - vop2: Add new display modes on RK3588 HDMI0 up to 4K
 - Don't change HDMI reference clock rate
 - Fix DT bindings
 - analogix_dp: add eDP support
 - fix shutodnw
 
 solomon:
 - Set SPI device table to silence warnings
 - Fix pixel and scanline encoding
 
 v3d:
 - handle clock
 
 vc4:
 - Use drm_exec
 - Use dma-resv for wait-BO ioctl
 - Remove seqno infrastructure
 
 virtgpu:
 - Support partial mappings of GEM objects
 - Reserve VGA resources during initialization
 - Fix UAF in virtgpu_dma_buf_free_obj()
 - Add panic support
 
 vkms:
 - Switch to a managed modesetting pipeline
 - Add support for ARGB8888
 - fix UAf
 
 xlnx:
 - Set correct DMA segment size
 - use mutex guards
 - Fix error handling
 - Fix docs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmfmA2cACgkQDHTzWXnE
 hr7lKQ/+I7gmln3ka8FyKnpwG5KusDpxz3OzgUKHpzOTkXL1+vPMt0vjCKRJKE3D
 zfpUTWNbwlVN0krqmUGyFIeCt8wmevBF6HvQ+GsWbiWltUj3xIlnkV0TVH84XTUo
 0evrXNG9K8sLjMKjrf7yrGL53Ayoaq9IO9wtOws+FCgtykAsMR/IWLrYLpj21ZQ6
 Oclhq5Cz21WRoQpzySR23s3DLi4LHri26RGKbGNh2PzxYwyP/euGW6O+ncEduNmg
 vQLgUfptaM/EubJFG6jxDWZJ2ChIAUUxQuhZwt7DKqRsYIcJKcfDSUzqL95t6SYU
 zewlYmeslvusoesCeTJzHaUj34yqJGsvjFPsHFUEvAy8BVncsqS40D6mhrRMo5nD
 chnSJu+IpDOEEqcdbb1J73zKLw6X3GROM8qUQEThJBD2yTsOdw9d0HXekMDMBXSi
 NayKvXfsBV19rI74FYnRCzHt8BVVANh5qJNnR5RcnPZ2KzHQbV0JFOA9YhxE8vFU
 GWkFWKlpAQUQ+yoTy1kuO9dcUxLIC7QseMeS5BYhcJBMEV78xQuRYRxgsL8YS4Yg
 rIhcb3mZwMFj7jBAqfpLKWiI+GTup+P9vcz7Bvm5iIf8gEhZLUTwqBeAYXQkcWd4
 i1AqDuFR0//sAgODoeU2sg1Q3yTL/i/DhcwvCIPgcDZ/x4Eg868=
 =oK/N
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "Outside of drm there are some rust patches from Danilo who maintains
  that area in here, and some pieces for drm header check tests.

  The major things in here are a new driver supporting the touchbar
  displays on M1/M2, the nova-core stub driver which is just the vehicle
  for adding rust abstractions and start developing a real driver inside
  of.

  xe adds support for SVM with a non-driver specific SVM core
  abstraction that will hopefully be useful for other drivers, along
  with support for shrinking for TTM devices. I'm sure xe and AMD
  support new devices, but the pipeline depth on these things is hard to
  know what they end up being in the marketplace!

  uapi:
   - add mediatek tiled fourcc
   - add support for notifying userspace on device wedged

  new driver:
   - appletbdrm: support for Apple Touchbar displays on m1/m2
   - nova-core: skeleton rust driver to develop nova inside off

  firmware:
   - add some rust firmware pieces

  rust:
   - add 'LocalModule' type alias

  component:
   - add helper to query bound status

  fbdev:
   - fbtft: remove access to page->index

  media:
   - cec: tda998x: import driver from drm

  dma-buf:
   - add fast path for single fence merging

  tests:
   - fix lockdep warnings

  atomic:
   - allow full modeset on connector changes
   - clarify semantics of allow_modeset and drm_atomic_helper_check
   - async-flip: support on arbitary planes
   - writeback: fix UAF
   - Document atomic-state history

  format-helper:
   - support ARGB8888 to ARGB4444 conversions

  buddy:
   - fix multi-root cleanup

  ci:
   - update IGT

  dp:
   - support extended wake timeout
   - mst: fix RAD to string conversion
   - increase DPCD eDP control CAP size to 5 bytes
   - add DPCD eDP v1.5 definition
   - add helpers for LTTPR transparent mode

  panic:
   - encode QR code according to Fido 2.2

  scheduler:
   - add parameter struct for init
   - improve job peek/pop operations
   - optimise drm_sched_job struct layout

  ttm:
   - refactor pool allocation
   - add helpers for TTM shrinker

  panel-orientation:
   - add a bunch of new quirks

  panel:
   - convert panels to multi-style functions
   - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3,
     LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry
     116KHD024006, Lenovo T14s Gen6 Snapdragon
   - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay
     kd110n11-51ie, Starry 2082109qfh040022-50e
   - visionox-r66451: use multi-style MIPI-DSI functions
   - raydium-rm67200: Add driver for Raydium RM67200
   - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17
   - sony-td4353-jdi: Use MIPI-DSI multi-func interface
   - summit: Add driver for Apple Summit display panel
   - visionox-rm692e5: Add driver for Visionox RM692E5

  bridge:
   - pass full atomic state to various callbacks
   - adv7511: Report correct capabilities
   - it6505: Fix HDCP V compare
   - snd65dsi86: fix device IDs
   - nwl-dsi: set bridge type
   - ti-sn65si83: add error recovery and set bridge type
   - synopsys: add HDMI audio support

  xe:
   - support device-wedged event
   - add mmap support for PCI memory barrier
   - perf pmu integration and expose per-engien activity
   - add EU stall sampling support
   - GPU SVM and Xe SVM implementation
   - use TTM shrinker
   - add survivability mode to allow the driver to do firmware updates
     in critical failure states
   - PXP HWDRM support for MTL and LNL
   - expose package/vram temps over hwmon
   - enable DP tunneling
   - drop mmio_ext abstraction
   - Reject BO evcition if BO is bound to current VM
   - Xe suballocator improvements
   - re-use display vmas when possible
   - add GuC Buffer Cache abstraction
   - PCI ID update for Panther Lake and Battlemage
   - Enable SRIOV for Panther Lake
   - Refactor VRAM manager location

  i915:
   - enable extends wake timeout
   - support device-wedged event
   - Enable DP 128b/132b SST DSC
   - FBC dirty rectangle support for display version 30+
   - convert i915/xe to drm client setup
   - Compute HDMI PLLS for rates not in fixed tables
   - Allow DSB usage when PSR is enabled on LNL+
   - Enable panel replay without full modeset
   - Enable async flips with compressed buffers on ICL+
   - support luminance based brightness via DPCD for eDP
   - enable VRR enable/disable without full modeset
   - allow GuC SLPC default strategies on MTL+ for performance
   - lots of display refactoring in move to struct intel_display

  amdgpu:
   - add device wedged event
   - support async page flips on overlay planes
   - enable broadcast RGB drm property
   - add info ioctl for virt mode
   - OEM i2c support for RGB lights
   - GC 11.5.2 + 11.5.3 support
   - SDMA 6.1.3 support
   - NBIO 7.9.1 + 7.11.2 support
   - MMHUB 1.8.1 + 3.3.2 support
   - DCN 3.6.0 support
   - Add dynamic workload profile switching for GC 10-12
   - support larger VBIOS sizes
   - Mark gttsize parameters as deprecated
   - Initial JPEG queue resset support

  amdkfd:
   - add KFD per process flags for setting precision
   - sync pasid values between KGD and KFD
   - improve GTT/VRAM handling for APUs
   - fix user queue validation on GC7/8
   - SDMA queue reset support

  raedeon:
   - rs400 hyperz fix

  i2c:
   - td998x: drop platform_data, split driver into media and bridge

  ast:
   - transmitter chip detection refactoring
   - vbios display mode refactoring
   - astdp: fix connection status and filter unsupported modes
   - cursor handling refactoring

  imagination:
   - check job dependencies with sched helper

  ivpu:
   - improve command queue handling
   - use workqueue for IRQ handling
   - add support HW fault injection
   - locking fixes

  mgag200:
   - add support for G200eH5

  msm:
   - dpu: add concurrent writeback support for DPU 10.x+
   - use LTTPR helpers
   - GPU:
     - Fix obscure GMU suspend failure
     - Expose syncobj timeline support
     - Extend GPU devcoredump with pagetable info
     - a623 support
     - Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot /
       devcoredump
   - Display:
     - Add cpu-cfg interconnect paths on SM8560 and SM8650
     - Introduce KMS OMMU fault handler, causing devcoredump snapshot
     - Fixed error pointer dereference in msm_kms_init_aspace()
   - DPU:
     - Fix mode_changing handling
     - Add writeback support on SM6150 (QCS615)
     - Fix DSC programming in 1:1:1 topology
     - Reworked hardware resource allocation, moving it to the CRTC code
     - Enabled support for Concurrent WriteBack (CWB) on SM8650
     - Enabled CDM blocks on all relevant platforms
     - Reworked debugfs interface for BW/clocks debugging
     - Clear perf params before calculating bw
     - Support YUV formats on writeback
     - Fixed double inclusion
     - Fixed writeback in YUV formats when using cloned output, Dropped
       wb2_formats_rgb
     - Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt
       kerneldocs
     - Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode()
   - DSI:
     - DSC-related fixes
     - Rework clock programming
   - DSI PHY:
     - Fix 7nm (and lower) PHY programming
     - Add proper DT schema definitions for DSI PHY clocks
   - HDMI:
     - Rework the driver, enabling the use of the HDMI Connector
       framework
   - Bindings:
     - Added eDP PHY on SA8775P

  nouveau:
   - move drm_slave_encoder interface into driver
   - nvkm: refactor GSP RPC
   - use LTTPR helpers

  mediatek:
   - HDMI fixup and refinement
   - add MT8188 dsc compatible
   - MT8365 SoC support

  panthor:
   - Expose sizes of intenral BOs via fdinfo
   - Fix race between reset and suspend
   - Improve locking

  qaic:
   - Add support for AIC200

  renesas:
   - Fix limits in DT bindings

  rockchip:
   - support rk3562-mali
   - rk3576: Add HDMI support
   - vop2: Add new display modes on RK3588 HDMI0 up to 4K
   - Don't change HDMI reference clock rate
   - Fix DT bindings
   - analogix_dp: add eDP support
   - fix shutodnw

  solomon:
   - Set SPI device table to silence warnings
   - Fix pixel and scanline encoding

  v3d:
   - handle clock

  vc4:
   - Use drm_exec
   - Use dma-resv for wait-BO ioctl
   - Remove seqno infrastructure

  virtgpu:
   - Support partial mappings of GEM objects
   - Reserve VGA resources during initialization
   - Fix UAF in virtgpu_dma_buf_free_obj()
   - Add panic support

  vkms:
   - Switch to a managed modesetting pipeline
   - Add support for ARGB8888
   - fix UAf

  xlnx:
   - Set correct DMA segment size
   - use mutex guards
   - Fix error handling
   - Fix docs"

* tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel: (1762 commits)
  drm/amd/pm: Update feature list for smu_v13_0_6
  drm/amdgpu: Add parameter documentation for amdgpu_sync_fence
  drm/amdgpu/discovery: optionally use fw based ip discovery
  drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics
  drm/amdgpu/discovery: check ip_discovery fw file available
  drm/amd/pm: Remove unnecessay UQ10 to UINT conversion
  drm/amd/pm: Remove unnecessay UQ10 to UINT conversion
  drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA
  drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush
  drm/amd/amdgpu: Increase max rings to enable SDMA page ring
  drm/amdgpu: Decode deferred error type in gfx aca bank parser
  drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs
  drm/amdgpu/mes: clean up SDMA HQD loop
  drm/amdgpu/mes: enable compute pipes across all MEC
  drm/amdgpu/mes: drop MES 10.x leftovers
  drm/amdgpu/mes: optimize compute loop handling
  drm/amdgpu/sdma: guilty tracking is per instance
  drm/amdgpu/sdma: fix engine reset handling
  drm/amdgpu: remove invalid usage of sched.ready
  drm/amdgpu: add cleaner shader trace point
  ...
2025-03-28 17:44:52 -07:00
Ming Lei ebf695f129 ublk: add segment parameter
IO split is usually bad in io_uring world, since -EAGAIN is caused and
IO handling may have to fallback to io-wq, this way does hurt performance.

ublk starts to support zero copy recently, for avoiding unnecessary IO
split, ublk driver's segment limit should be aligned with backend
device's segment limit.

Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
which number is aligned with ublk request segment number, so that big
memory allocation can be avoided by setting reasonable max_segments limit.

So add segment parameter for providing ublk server chance to align
segment limit with backend, and keep it reasonable from implementation
viewpoint.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250327095123.179113-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 16:15:43 -06:00
Linus Torvalds eff5f16bfd for-6.15/io_uring-reg-vec-20250327
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmflYcAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmvJD/4tKQlr0yRhln/JzPiONS41mUAuNRI4MdqJ
 ykpQkMx3NcQANbNyOxI0PV5I7y1Jdlg/UP9gy11BrIaBk4Kqoluc6iAzgr5q9pBC
 8pXhPIe80R/q/LOKEz9n5gqOMPNyUtd7IaBayJPBJre/YZXQu+49IL2Uyy3hss8d
 neqAbWErd2FoUfTY14XB3ImLM6a76Z6CjE3pJYvVDM5uRBuH0IGqehJJuNpsViBf
 M9XAW/HZt8ISsVt1tJbCQVWx4b63L/omHI8u5K2M0isTPV+QPk1O2Vgkn7dBrDeT
 JvThWrM1uE++DYGcQ3DXHfb3gBIFEjTrNb2nddstyEU2ZaEXUkuOV2O0b7WPuphj
 zp0oFaLl/ivHT8NoJzzZzK24zt99Qz43GWUaFCQeR0R8oTix/M1q0unguER45Iv7
 Po/b3h6+RAi+87KOlM5WWo05ScswS8AwcSUsP5xMR5BjjD+GQYO5PmVVyo8w0rid
 8F9U9DpN2CTA5YVjI+ax1cxWMOfmAXPK5ONjzZpyJoWb0THgj97esEwc2un7SBi7
 TJJz7Gc9/xOqfRKaPDoH9t8+b6ruWHMqCYDw6exSAUKeDxQ+7z0zNMudHkuR5VrX
 x+Taaj95ONLVNZYz0mbFcvmJC0UBOqkE94omXk7TU2Cn7SBzAW//XDep6CPpX/sa
 LcmOK4UXdg==
 =vOm1
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/io_uring-reg-vec-20250327' of git://git.kernel.dk/linux

Pull more io_uring updates from Jens Axboe:
 "Final separate updates for io_uring.

  This started out as a series of cleanups improvements and improvements
  for registered buffers, but as the last series of the io_uring changes
  for 6.15, it also collected a few fixes for the other branches on top:

   - Add support for vectored fixed/registered buffers.

     Previously only single segments have been supported for commands,
     now vectored variants are supported as well. This series includes
     networking and file read/write support.

   - Small series unifying return codes across multi and single shot.

   - Small series cleaning up registerd buffer importing.

   - Adding support for vectored registered buffers for uring_cmd.

   - Fix for io-wq handling of command reissue.

   - Various little fixes and tweaks"

* tag 'for-6.15/io_uring-reg-vec-20250327' of git://git.kernel.dk/linux: (25 commits)
  io_uring/net: fix io_req_post_cqe abuse by send bundle
  io_uring/net: use REQ_F_IMPORT_BUFFER for send_zc
  io_uring: move min_events sanitisation
  io_uring: rename "min" arg in io_iopoll_check()
  io_uring: open code __io_post_aux_cqe()
  io_uring: defer iowq cqe overflow via task_work
  io_uring: fix retry handling off iowq
  io_uring/net: only import send_zc buffer once
  io_uring/cmd: introduce io_uring_cmd_import_fixed_vec
  io_uring/cmd: add iovec cache for commands
  io_uring/cmd: don't expose entire cmd async data
  io_uring: rename the data cmd cache
  io_uring: rely on io_prep_reg_vec for iovec placement
  io_uring: introduce io_prep_reg_iovec()
  io_uring: unify STOP_MULTISHOT with IOU_OK
  io_uring: return -EAGAIN to continue multishot
  io_uring: cap cached iovec/bvec size
  io_uring/net: implement vectored reg bufs for zctx
  io_uring/net: convert to struct iou_vec
  io_uring/net: pull vec alloc out of msghdr import
  ...
2025-03-28 15:07:04 -07:00
Linus Torvalds 6df9d086ff for-6.15/io_uring-epoll-wait-20250325
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfjTRYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvfMD/4596ZTo0bzeimMMzFloabhaqcMC9eO21iW
 oNb6KdSKQzAOdQl3+YfhSrP/ZbeoJl8y0YoPGIA2RBd4ELl/4YPVTNeahNhz70P2
 iawXcYmMyrM+6W6RXof9uj+k8mE17G7N+M7DuMYjfMnC/OhDKg/3DbbydwesMW47
 /1naMsk/rJx6CbeVKAl3V2es3NPUrZlh/dsurm5muGqXHAUmv459w46togKNvUhM
 tAGSq54zyPLnvnuA4Cm1QBIYtN5dHmjDjMmEXFiEF1qg7c5WIfVpkopc/SjvvlQs
 fVwN9Mn9Mh1VXuLysSqa5RsXINtXbJdpyXnh8pG/QqhntftNRKkwMxlN0gEYfBzB
 4iuvzUpGmPAwPhYJ60Ylci4KhWY31mwbU+ns9+/G4bIdDxdEMJGvy4D/oz/fUr9p
 1gwAIkJgfixb6fsUrL6REFoSEmRQQ7hQkyKRFyvXSXsTtqXz2BR7O75ujK381J9d
 Sa3zexDBRaJWlDWspRe7Olcwf/hqLmY3cWwQ8Z+a2vozU2dAtYpKr5Lhr7SjuftI
 Uw4pCkrXUG5EhdMSNuQnzVZv3xWPGAAUaAIT0MhDcWakhNT1dyyjbuXWuFzDmKQI
 0Q71+UPBW300eJx8bFupYQ1mKp10LbjewCxiXOkBFZxaG7n5UwGsv4FoE3CJ8QEP
 V0ou6ZpBFA==
 =AZkA
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/io_uring-epoll-wait-20250325' of git://git.kernel.dk/linux

Pull io_uring epoll support from Jens Axboe:
 "This adds support for reading epoll events via io_uring.

  While this may seem counter-intuitive (and/or productive), the
  reasoning here is that quite a few existing epoll event loops can
  easily do a partial conversion to a completion based model, but are
  still stuck with one (or few) event types that remain readiness based.

  For that case, they then need to add the io_uring fd to the epoll
  context, and continue to rely on epoll_wait(2) for waiting on events.
  This misses out on the finer grained waiting that io_uring can do, to
  reduce context switches and wait for multiple events in one batch
  reliably.

  With adding support for reaping epoll events via io_uring, the whole
  legacy readiness based event types can still be reaped via epoll, with
  the overall waiting in the loop be driven by io_uring"

* tag 'for-6.15/io_uring-epoll-wait-20250325' of git://git.kernel.dk/linux:
  io_uring/epoll: add support for IORING_OP_EPOLL_WAIT
  io_uring/epoll: remove CONFIG_EPOLL guards
2025-03-28 14:55:32 -07:00
Linus Torvalds ca0b04ba0b for-6.15/io_uring-rx-zc-20250325
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfjTP8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpm6oEACnpGL52FAKTVj14GDqFo6Pq0Jmnh07x8qj
 mpHFPwxfWAzRiuNyji2iS9ecS2cnlkixNyMWZipXRi4KJAUjJH6YDd7IofUI3Glf
 6v7b6srFSvsWJIJ8LdkJHLHAJuzYnJvFZ8apwgQczEDqgHq7BAunM1sVQ+mydjYk
 EXT4kN6DSBOPzwr9GAay52f8nXhbqdHfT+YTGHPHg+QToojL6gD7vvW57w/QqD/x
 91hJef1z01cSIsDZOxA0EUeD+9bBAHpoamr/e3IOOCVYCN6hy0dGa9g0QGbbpVyE
 AeU4FGZLV9J8OOfvHVraDt5Wn3IXxYaMu22dSn1S6tVinwjXhJR2LAA+t4fGHAkt
 i38LjOsIbopSQn/cNhzwC8UZcHLqnVsdDolHlHzsVFVfcpck2/4JFpUeP8QhWgrk
 f9tY12QUf/oEaWm0/sUCHZNFxpIGeFA5FFXf0Z92clnzBuiuWoesBNvxqY/2DeZn
 IDNXiv+Trxr6kFEjTpzPeuxbWrn4PJ7afQSAFcEmOCguk5riM+zJZNIKg0TxUHSS
 tt6sfxmTP1DhgDKad5kT3MLyzOcx47Kbjf4dj6KmRnD+3DGwwN2F7X7R1GJylPSp
 RLOzJ+Ouuy9UmBN6JMsT4BmR9+FJTVirADU926d/ZqCTtRV8Tnq/6HHmKmmr4CR0
 THJ0PJqQjg==
 =MOve
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux

Pull io_uring zero-copy receive support from Jens Axboe:
 "This adds support for zero-copy receive with io_uring, enabling fast
  bulk receive of data directly into application memory, rather than
  needing to copy the data out of kernel memory.

  While this version only supports host memory as that was the initial
  target, other memory types are planned as well, with notably GPU
  memory coming next.

  This work depends on some networking components which were queued up
  on the networking side, but have now landed in your tree.

  This is the work of Pavel Begunkov and David Wei. From the v14 posting:

    'We configure a page pool that a driver uses to fill a hw rx queue
     to hand out user pages instead of kernel pages. Any data that ends
     up hitting this hw rx queue will thus be dma'd into userspace
     memory directly, without needing to be bounced through kernel
     memory. 'Reading' data out of a socket instead becomes a
     _notification_ mechanism, where the kernel tells userspace where
     the data is. The overall approach is similar to the devmem TCP
     proposal

     This relies on hw header/data split, flow steering and RSS to
     ensure packet headers remain in kernel memory and only desired
     flows hit a hw rx queue configured for zero copy. Configuring this
     is outside of the scope of this patchset.

     We share netdev core infra with devmem TCP. The main difference is
     that io_uring is used for the uAPI and the lifetime of all objects
     are bound to an io_uring instance. Data is 'read' using a new
     io_uring request type. When done, data is returned via a new shared
     refill queue. A zero copy page pool refills a hw rx queue from this
     refill queue directly. Of course, the lifetime of these data
     buffers are managed by io_uring rather than the networking stack,
     with different refcounting rules.

     This patchset is the first step adding basic zero copy support. We
     will extend this iteratively with new features e.g. dynamically
     allocated zero copy areas, THP support, dmabuf support, improved
     copy fallback, general optimisations and more'

  In a local setup, I was able to saturate a 200G link with a single CPU
  core, and at netdev conf 0x19 earlier this month, Jamal reported
  188Gbit of bandwidth using a single core (no HT, including soft-irq).

  Safe to say the efficiency is there, as bigger links would be needed
  to find the per-core limit, and it's considerably more efficient and
  faster than the existing devmem solution"

* tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux:
  io_uring/zcrx: add selftest case for recvzc with read limit
  io_uring/zcrx: add a read limit to recvzc requests
  io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
  io_uring: Rename KConfig to Kconfig
  io_uring/zcrx: fix leaks on failed registration
  io_uring/zcrx: recheck ifq on shutdown
  io_uring/zcrx: add selftest
  net: add documentation for io_uring zcrx
  io_uring/zcrx: add copy fallback
  io_uring/zcrx: throttle receive requests
  io_uring/zcrx: set pp memory provider for an rx queue
  io_uring/zcrx: add io_recvzc request
  io_uring/zcrx: dma-map area for the device
  io_uring/zcrx: implement zerocopy receive pp memory provider
  io_uring/zcrx: grab a net device
  io_uring/zcrx: add io_zcrx_area
  io_uring/zcrx: add interface queue and refill queue
2025-03-28 13:45:52 -07:00
Linus Torvalds 7288511606 Landlock update for v6.15-rc1
-----BEGIN PGP SIGNATURE-----
 
 iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZ+bGgBAcbWljQGRpZ2lr
 b2QubmV0AAoJEOXj0OiMgvbSKmgBAICZsmQTuKMHIXdB7kwA+BX5k++SZcyA+qHN
 0hrJTSMsAP0Uv6NpiPT4CTduqBMRbuMwNhujBczRiok6yaHDbC8eCw==
 =K8XL
 -----END PGP SIGNATURE-----

Merge tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock updates from Mickaël Salaün:
 "This brings two main changes to Landlock:

   - A signal scoping fix with a new interface for user space to know if
     it is compatible with the running kernel.

   - Audit support to give visibility on why access requests are denied,
     including the origin of the security policy, missing access rights,
     and description of object(s). This was designed to limit log spam
     as much as possible while still alerting about unexpected blocked
     access.

  With these changes come new and improved documentation, and a lot of
  new tests"

* tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (36 commits)
  landlock: Add audit documentation
  selftests/landlock: Add audit tests for network
  selftests/landlock: Add audit tests for filesystem
  selftests/landlock: Add audit tests for abstract UNIX socket scoping
  selftests/landlock: Add audit tests for ptrace
  selftests/landlock: Test audit with restrict flags
  selftests/landlock: Add tests for audit flags and domain IDs
  selftests/landlock: Extend tests for landlock_restrict_self(2)'s flags
  selftests/landlock: Add test for invalid ruleset file descriptor
  samples/landlock: Enable users to log sandbox denials
  landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF
  landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags
  landlock: Log scoped denials
  landlock: Log TCP bind and connect denials
  landlock: Log truncate and IOCTL denials
  landlock: Factor out IOCTL hooks
  landlock: Log file-related denials
  landlock: Log mount-related denials
  landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status
  landlock: Add AUDIT_LANDLOCK_ACCESS and log ptrace denials
  ...
2025-03-28 12:37:13 -07:00
Bagas Sanjaya 858c9c10c1 iommufd: Fix iommu_vevent_header tables markup
Stephen Rothwell reports htmldocs warnings on iommufd_vevent_header
tables:

Documentation/userspace-api/iommufd:323: ./include/uapi/linux/iommufd.h:1048: CRITICAL: Unexpected section title or transition.

------------------------------------------------------------------------- [docutils]
WARNING: kernel-doc './scripts/kernel-doc -rst -enable-lineno -sphinx-version 8.1.3 ./include/uapi/linux/iommufd.h' processing failed with: Documentation/userspace-api/iommufd:323: ./include/uapi/linux/iommufd.h:1048: (SEVERE/4) Unexpected section title or transition.

-------------------------------------------------------------------------

These are because Sphinx confuses the tables for section headings. Fix
the table markup to squash away above warnings.

Fixes: e36ba5ab80 ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Link: https://patch.msgid.link/r/20250328114654.55840-1-bagasdotme@gmail.com
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250318213359.5dc56fd1@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28 10:07:23 -03:00
Yi Liu 803f97298e iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
PASID usage requires PASID support in both device and IOMMU. Since the
iommu drivers always enable the PASID capability for the device if it
is supported, this extends the IOMMU_GET_HW_INFO to report the PASID
capability to userspace. Also, enhances the selftest accordingly.

Link: https://patch.msgid.link/r/20250321180143.8468-5-yi.l.liu@intel.com
Cc: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> #aarch64 platform
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28 10:07:23 -03:00
Aaron Ruby d499effe1d drm/virtio: Add capset definitions to UAPI
Since the context-type additions to the virtio-gpu spec, these have been
defined locally in guest user-space, and virtio-gpu backend library code.

Now, these capsets have been stabilized, and should be defined in a
common space, in both the virtio_gpu header, and alongside the virtgpu_drm
interface that they apply to.

Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Aaron Ruby <aruby@qnx.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
[dmitry.osipenko@collabora.com: edit commit title]
Link: https://patchwork.freedesktop.org/patch/msgid/YT3PR01MB5857E808EDF6949F2DF517FDAFA12@YT3PR01MB5857.CANPRD01.PROD.OUTLOOK.COM
2025-03-28 04:53:51 +03:00
Linus Torvalds 81d8e5e213 f2fs-for-6.15-rc1
In this round, there are three major updates: 1) folio conversion, 2) refactor
 for mount API conversion, 3) some performance improvement such as direct IO,
 checkpoint speed, and IO priority hints. For stability, there are patches
 which add more sanity checks and fixes some major issues like i_size in
 atomic write operations and write pointer recovery in zoned devices.
 
 Enhancement:
  - huge folio converion work by Matthew Wilcox
  - clean up for mount API conversion by Eric Sandeen
  - improve direct IO speed in the overwrite case
  - add some sanity check on node consistency
  - set highest IO priority for checkpoint thread
  - keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
  - add ioctl to get IO priority hint
  - add carve_out sysfs node for fsstat
 
 Bug fix:
  - disable nat_bits during umount to avoid potential nat entry corruption
  - fix missing i_size update on atomic writes
  - fix missing discard for active segments
  - fix running out of free segments
  - fix out-of-bounds access in f2fs_truncate_inode_blocks()
  - call f2fs_recover_quota_end() correctly
  - fix potential deadloop in prepare_compress_overwrite()
  - fix the missing write pointer correction for zoned device
  - fix to avoid panic once fallocation fails for pinfile
  - don't retry IO for corrupted data scenario
 
 There are many other clean up patches and minor bug fixes as usual.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmfhjuYACgkQQBSofoJI
 UNJF+hAAip0Sf6bXwt43KR3lmbXg/YBTtPDACOs375Pn8pfRVsAPIAf5e1AnlBET
 rA0KpqUrEZHXenQFF9n4LIoHYqfuUw3EMempuvp4qgQcx15Kaajw+EGWVLYstNy2
 9dELc9DA55f/i1uvHezGfQFy6hMfXasf+tkaYk0z0ZYDStYboLgkVAY+F869q1Dl
 D16Y7Lna12h4eCxDdssIPDsLjFH/2LDn7SmhXsnpZxwK0Zx8JSo83WxXHnzXHxz4
 vUkukInKgqcBDvf6ufW/YF3/tqSs20XEXNK3cI1vyHx1dwij6Us+G0n0WJHL30QE
 zsliecR0X28JP1rJ3ldCjr+4Kxm6/u/Uwpinm2Fm1jL67UY4TZcaARBE8k20I6ND
 j/L1+sXrIdZ1aILM9/bwCjgXiVFdZbvlfGfpTj1duAkQgRd+/s9cjPlo5j5HTad5
 XJmzJz6YOaUNarhP/E31Z9SV9M2kEcmoDOTxKBg6ZcMWXZ27B6Z0ag92prd0GWk6
 rWDuVj0eP/LDY1QedbHRPbg1D84jVgcnldfPaf9ptln993skJGdgS0dkegTqMR0L
 H8RgpWOzWZI53gnQdePdej8diHmD8uRTrLf/oABm//GzTHC5BdwVpArOl1DCuLFF
 YkMtVEicgnWRmL3PgODAdTJXaDi3uGvT116i+lfMlUn9p+t93r0=
 =E+TE
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, there are three major updates: (1) folio conversion,
  (2) refactoring for mount API conversion, (3) some performance
  improvement such as direct IO, checkpoint speed, and IO priority
  hints.

  For stability, there are patches which add more sanity checks and
  fixes some major issues like i_size in atomic write operations and
  write pointer recovery in zoned devices.

  Enhancements:
   - huge folio converion work by Matthew Wilcox
   - clean up for mount API conversion by Eric Sandeen
   - improve direct IO speed in the overwrite case
   - add some sanity check on node consistency
   - set highest IO priority for checkpoint thread
   - keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
   - add ioctl to get IO priority hint
   - add carve_out sysfs node for fsstat

  Bug fixes:
   - disable nat_bits during umount to avoid potential nat entry corruption
   - fix missing i_size update on atomic writes
   - fix missing discard for active segments
   - fix running out of free segments
   - fix out-of-bounds access in f2fs_truncate_inode_blocks()
   - call f2fs_recover_quota_end() correctly
   - fix potential deadloop in prepare_compress_overwrite()
   - fix the missing write pointer correction for zoned device
   - fix to avoid panic once fallocation fails for pinfile
   - don't retry IO for corrupted data scenario

  There are many other clean up patches and minor bug fixes as usual"

* tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: fix missing discard for active segments
  f2fs: optimize f2fs DIO overwrites
  f2fs: fix to avoid atomicity corruption of atomic file
  f2fs: pass sbi rather than sb to parse_options()
  f2fs: pass sbi rather than sb to quota qf_name helpers
  f2fs: defer readonly check vs norecovery
  f2fs: Pass sbi rather than sb to f2fs_set_test_dummy_encryption
  f2fs: make LAZYTIME a mount option flag
  f2fs: make INLINECRYPT a mount option flag
  f2fs: factor out an f2fs_default_check function
  f2fs: consolidate unsupported option handling errors
  f2fs: use f2fs_sb_has_device_alias during option parsing
  f2fs: add carve_out sysfs node
  f2fs: fix to avoid running out of free segments
  f2fs: Remove f2fs_write_node_page()
  f2fs: Remove f2fs_write_meta_page()
  f2fs: Remove f2fs_write_data_page()
  f2fs: Remove check for ->writepage
  Revert "f2fs: rebuild nat_bits during umount"
  f2fs: fix to avoid accessing uninitialized curseg
  ...
2025-03-27 12:55:54 -07:00
Linus Torvalds fd71def6d9 for-6.15-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmfZy+0ACgkQxWXV+ddt
 WDtcRw//bUfqbabUGBZ+t/a7YahSeukKx7jhHEHDvzaK8LSZj4otZtLtlKZbaNQK
 gGhMitd+rwkf/KnnRvCmS9Y6v4PHbsH8NX0PaGH4ZFYD4mifAs6HNSQUzQIASAZt
 OhX/PaKUdLN6kFOt4Yg8Qtem5LcF9Kmrc43ySkcF1T7KtZey8KZypMf0Af/4KvP/
 QcNiYJiUlotz6m5K0+TjsDVJDKbYPYy07u3/9GHJBN8bEf5jswPmfDJrONd+NDFS
 rMylVCTkW5Hl93qDM0zINPcyfuFFNUH4fWJVRizJPmOwQWUqkRx4J5nSKZzQSlgg
 O3KTEYPJHG388an1Cs/k4oIEpOq2xJ7RKJP8ksPf/IcXOTJ0dLXUQisheRoeGyYR
 04TWP1rZ2vyQI/LzlOiRozCkAWWhLMJMvWXRUTK/9z9Jh2dcbPdykJGQZ11D9hNI
 W5i0XsHX/P2xD8D2sOHo+QY5o1QzMZpb+IaL/+Kv22s3Vb1brabZgOAq8H13l1/y
 oe3RLVSLueth22q4GK/MSi7hxSZwV6Zj5HtxYxfs4RFqWo9sM6mp9xP3Via3MnLA
 fK8FIMYUMqgvqonDqUD8Gv+YV15Haq8icO/2F9b9eiycJ1mSsRILVEiVCJGbBYIz
 C1tB7j5Lv44ZExKHmxPzHMa8rrrG+jaSxxZpuLuOYX0VvVECKVY=
 =t4Jn
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "User visible changes:

   - fall back to buffered write if direct io is done on a file that
     requires checksums
      - this avoids a problem with checksum mismatch errors, observed
        e.g. on virtual images when writes to pages under writeback
        cause the checksum mismatch reports
      - this may lead to some performance degradation but currently the
        recommended setup for VM images is to use the NOCOW file
        attribute that also disables checksums

   - fast/realtime zstd levels -15 to -1
      - supported by mount options (compress=zstd:-5) and defrag ioctl
      - improved speed, reduced compression ratio, check the commit for
        sample measurements

   - defrag ioctl extended to accept negative compression levels

   - subpage mode
      - remove warning when subpage mode is used, the feature is now
        reasonably complete and tested
      - in debug mode allow to create 2K b-tree nodes to allow testing
        subpage on x86_64 with 4K pages too

  Performance improvements:

   - in send, better file path caching improves runtime (on sample load
     by -30%)

   - on s390x with hardware zlib support prepare the input buffer in a
     better way to get the best results from the acceleration

   - minor speed improvement in encoded read, avoid memory allocation in
     synchronous mode

  Core:

   - enable stable writes on inodes, replacing manually waiting for
     writeback and allowing to skip that on inodes without checksums

   - add last checks and warnings for out-of-band dirty writes to pages,
     requiring a fixup ("fixup worker"), this should not be necessary
     since 5.8 where get_user_page() and pin_user_pages*() prevent this
      - long history behind that, we'll be happy to remove the whole
        infrastructure in the near future

   - more folio API conversions and preparations for large folio support

   - subpage cleanups and refactoring, split handling of data and
     metadata to allow future support for large folios

   - readpage works as block-by-block, no change for normal mode, this
     is preparation for future subpage updates

   - block group refcount fixes and hardening

   - delayed iput fixes

   - in zoned mode, fix zone activation on filesystem with missing
     devices

  Cleanups:

   - inode parameter cleanups

   - path auto-freeing updates

   - code flow simplifications in send

   - redundant parameter cleanups"

* tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (164 commits)
  btrfs: zoned: fix zone finishing with missing devices
  btrfs: zoned: fix zone activation with missing devices
  btrfs: remove end_no_trans label from btrfs_log_inode_parent()
  btrfs: simplify condition for logging new dentries at btrfs_log_inode_parent()
  btrfs: remove redundant else statement from btrfs_log_inode_parent()
  btrfs: use memcmp_extent_buffer() at replay_one_extent()
  btrfs: update outdated comment for overwrite_item()
  btrfs: use variables to store extent buffer and slot at overwrite_item()
  btrfs: avoid unnecessary memory allocation and copy at overwrite_item()
  btrfs: don't clobber ret in btrfs_validate_super()
  btrfs: prepare btrfs_page_mkwrite() for large folios
  btrfs: prepare extent_io.c for future large folio support
  btrfs: prepare btrfs_launcher_folio() for large folios support
  btrfs: replace PAGE_SIZE with folio_size for subpage.[ch]
  btrfs: add a size parameter to btrfs_alloc_subpage()
  btrfs: subpage: make btrfs_is_subpage() check against a folio
  btrfs: add extra warning if delayed iput is added when it's not allowed
  btrfs: avoid redundant path slot assignment in btrfs_search_forward()
  btrfs: remove unnecessary btrfs_key local variable in btrfs_search_forward()
  btrfs: simplify the return value handling in search_ioctl()
  ...
2025-03-27 12:51:48 -07:00
Bjorn Helgaas cc28c0e5e7 Merge branch 'pci/endpoint-test'
- Fix endpoint BAR testing so the test can skip disabled BARs instead of
  reporting them as failures (Niklas Cassel)

- Verify that pci_endpoint interrupt tests set the correct IRQ type
  (Kunihiko Hayashi)

- Fix interpretation of pci_endpoint_test_bars_read_bar() error returns
  (Niklas Cassel)

- Fix potential string truncation in pci_endpoint_test_probe() (Niklas
  Cassel)

- Increase endpoint test BAR size variable to accommodate BARs larger than
  INT_MAX (Niklas Cassel)

- Release IRQs to avoid leak in pci_endpoint interrupt tests (Kunihiko
  Hayashi)

- Log the correct IRQ type when pci_endpoint IRQ request test fails
  (Kunihiko Hayashi)

- Remove pci_endpoint_test irq_type and no_msi globals; instead use
  test->irq_type (Kunihiko Hayashi)

- Remove unnecessary use of managed IRQ functions in pci_endpoint_test
  (Kunihiko Hayashi)

- Add and use IRQ_TYPE_* defines in pci_endpoint_test (Niklas Cassel)

- Add struct pci_epc_features.intx_capable and note that RK3568 and RK3588
  can't raise INTx interrupts (Niklas Cassel)

- Expose supported IRQ types in CAPS so pci_endpoint_test can set
  appropriate type (Niklas Cassel)

- Add PCITEST_IRQ_TYPE_AUTO to pci_endpoint_test for cases where the IRQ
  type doesn't matter (Niklas Cassel)

* pci/endpoint-test:
  misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
  PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
  PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
  PCI: endpoint: Add intx_capable to epc_features struct
  selftests: pci_endpoint: Use IRQ_TYPE_* defines from UAPI header
  misc: pci_endpoint_test: Use IRQ_TYPE_* defines from UAPI header
  PCI: endpoint: pcitest: Add IRQ_TYPE_* defines to UAPI header
  misc: pci_endpoint_test: Do not use managed IRQ functions
  misc: pci_endpoint_test: Remove global 'irq_type' and 'no_msi'
  misc: pci_endpoint_test: Fix 'irq_type' to convey the correct type
  misc: pci_endpoint_test: Fix displaying 'irq_type' after 'request_irq' error
  misc: pci_endpoint_test: Avoid issue of interrupts remaining after request_irq error
  misc: pci_endpoint_test: Handle BAR sizes larger than INT_MAX
  misc: pci_endpoint_test: Give disabled BARs a distinct error code
  misc: pci_endpoint_test: Fix potential truncation in pci_endpoint_test_probe()
  misc: pci_endpoint_test: Fix pci_endpoint_test_bars_read_bar() error handling
  selftests: pci_endpoint: Add GET_IRQTYPE checks to each interrupt test
  selftests: pci_endpoint: Skip disabled BARs
2025-03-27 13:14:46 -05:00
Bjorn Helgaas 38d42a6612 Merge branch 'pci/resource'
- Use pci_resource_n() to simplify BAR/window resource lookup (Ilpo
  Järvinen)

- Fix typo that repeatedly distributed resources to a bridge instead of
  iterating over subordinate bridges, which resulted in too little space to
  assign some BARs (Kai-Heng Feng)

- Relax bridge window tail sizing for optional resources, e.g., IOV BARs,
  to avoid failures when removing and re-adding devices (Ilpo Järvinen)

- Fix a double counting error for I/O resources, as we previously did for
  memory resources (Ilpo Järvinen)

- Use resource_set_{range,size}() helpers in more places (Ilpo Järvinen)

- Add pci_resource_is_iov() to identify IOV resources (Ilpo Järvinen)

- Add pci_resource_num() to look up the BAR number from the resource
  pointer (Ilpo Järvinen)

- Add restore_dev_resource() to simplify code that resources saved device
  resources (Ilpo Järvinen)

- Allow drivers to enable devices even if we haven't assigned optional IOV
  resources to them (Ilpo Järvinen)

- Improve debug output during resource reallocation (Ilpo Järvinen)

- Rework handling of optional resources (IOV BARs, ROMs) to reduce failures
  if we can't allocate them (Ilpo Järvinen)

- Move declarations of pci_rescan_bus_bridge_resize(),
  pci_reassign_bridge_resources(), and CardBus-related sizes from
  include/linux/pci.h to drivers/pci/pci.h since they're not used outside
  the PCI core (Ilpo Järvinen)

- Make pci_setup_bridge() static (Ilpo Järvinen)

- Fix a NULL dereference in the SR-IOV VF creation error path (Shay Drory)

- Fix s390 mmio_read/write syscalls, which didn't cause page faults in some
  cases, which broke vfio-pci lazy mapping on first access (Niklas
  Schnelle)

- Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which was
  disabled only for s390 (Niklas Schnelle)

- Support mmap of PCI resources on s390 except for ISM devices (Niklas
  Schnelle)

* pci/resource:
  s390/pci: Support mmap() of PCI resources except for ISM devices
  s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP
  s390/pci: Fix s390_mmio_read/write syscall page fault handling
  PCI: Fix NULL dereference in SR-IOV VF creation error path
  PCI: Move cardbus IO size declarations into pci/pci.h
  PCI: Make pci_setup_bridge() static
  PCI: Move resource reassignment func declarations into pci/pci.h
  PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h
  PCI: Fix BAR resizing when VF BARs are assigned
  PCI: Do not claim to release resource falsely
  PCI: Increase Resizable BAR support from 512 GB to 128 TB
  PCI: Rework optional resource handling
  PCI: Perform reset_resource() and build fail list in sync
  PCI: Use res->parent to check if resource is assigned
  PCI: Add debug print when releasing resources before retry
  PCI: Indicate optional resource assignment failures
  PCI: Always have realloc_head in __assign_resources_sorted()
  PCI: Extend enable to check for any optional resource
  PCI: Add restore_dev_resource()
  PCI: Remove incorrect comment from pci_reassign_resource()
  PCI: Consolidate assignment loop next round preparation
  PCI: Rename retval to ret
  PCI: Use while loop and break instead of gotos
  PCI: Refactor pdev_sort_resources() & __dev_sort_resources()
  PCI: Converge return paths in __assign_resources_sorted()
  PCI: Add dev & res local variables to resource assignment funcs
  PCI: Add pci_resource_num() helper
  PCI: Check resource_size() separately
  PCI: Add pci_resource_is_iov() to identify IOV resources
  PCI: Use resource_set_{range,size}() helpers
  PCI: Use SZ_* instead of literals in setup-bus.c
  PCI: Fix old_size lower bound in calculate_iosize() too
  PCI: Allow relaxed bridge window tail sizing for optional resources
  PCI: Simplify size1 assignment logic
  PCI: Use min_align, not unrelated add_align, for size0
  PCI: Remove add_align overwrite unrelated to size0
  PCI: Use downstream bridges for distributing resources
  PCI: Cleanup dev->resource + resno to use pci_resource_n()
2025-03-27 13:14:45 -05:00
Bjorn Helgaas 651aa9052c Merge branch 'pci/doe'
- Rename DOE 'protocol' to 'feature' to follow spec terminology (Alistair
  Francis)

- Expose supported DOE features via sysfs (Alistair Francis)

- Allow DOE support to be enabled even if CXL isn't enabled (Alistair
  Francis)

* pci/doe:
  PCI/DOE: Allow enabling DOE without CXL
  PCI/DOE: Expose DOE features via sysfs
  PCI/DOE: Rename Discovery Response Data Object Contents to type
  PCI/DOE: Rename DOE protocol to feature
2025-03-27 13:14:43 -05:00
Linus Torvalds 1a9239bb42 Networking changes for 6.15.
Core & protocols
 ----------------
 
  - Continue Netlink conversions to per-namespace RTNL lock
    (IPv4 routing, routing rules, routing next hops, ARP ioctls).
 
  - Continue extending the use of netdev instance locks. As a driver
    opt-in protect queue operations and (in due course) ethtool
    operations with the instance lock and not RTNL lock.
 
  - Support collecting TCP timestamps (data submitted, sent, acked)
    in BPF, allowing for transparent (to the application) and lower
    overhead tracking of TCP RPC performance.
 
  - Tweak existing networking Rx zero-copy infra to support zero-copy
    Rx via io_uring.
 
  - Optimize MPTCP performance in single subflow mode by 29%.
 
  - Enable GRO on packets which went thru XDP CPU redirect (were queued
    for processing on a different CPU). Improving TCP stream performance
    up to 2x.
 
  - Improve performance of contended connect() by 200% by searching
    for an available 4-tuple under RCU rather than a spin lock.
    Bring an additional 229% improvement by tweaking hash distribution.
 
  - Avoid unconditionally touching sk_tsflags on RX, improving
    performance under UDP flood by as much as 10%.
 
  - Avoid skb_clone() dance in ping_rcv() to improve performance under
    ping flood.
 
  - Avoid FIB lookup in netfilter if socket is available, 20% perf win.
 
  - Rework network device creation (in-kernel) API to more clearly
    identify network namespaces and their roles.
    There are up to 4 namespace roles but we used to have just 2 netns
    pointer arguments, interpreted differently based on context.
 
  - Use sysfs_break_active_protection() instead of trylock to avoid
    deadlocks between unregistering objects and sysfs access.
 
  - Add a new sysctl and sockopt for capping max retransmit timeout
    in TCP.
 
  - Support masking port and DSCP in routing rule matches.
 
  - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
 
  - Support specifying at what time packet should be sent on AF_XDP
    sockets.
 
  - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users.
 
  - Add Netlink YAML spec for WiFi (nl80211) and conntrack.
 
  - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
    which only need to be exported when IPv6 support is built as a module.
 
  - Age FDB entries based on Rx not Tx traffic in VxLAN, similar
    to normal bridging.
 
  - Allow users to specify source port range for GENEVE tunnels.
 
  - netconsole: allow attaching kernel release, CPU ID and task name
    to messages as metadata
 
 Driver API
 ----------
 
  - Continue rework / fixing of Energy Efficient Ethernet (EEE) across
    the SW layers. Delegate the responsibilities to phylink where possible.
    Improve its handling in phylib.
 
  - Support symmetric OR-XOR RSS hashing algorithm.
 
  - Support tracking and preserving IRQ affinity by NAPI itself.
 
  - Support loopback mode speed selection for interface selftests.
 
 Device drivers
 --------------
 
  - Remove the IBM LCS driver for s390.
 
  - Remove the sb1000 cable modem driver.
 
  - Add support for SFP module access over SMBus.
 
  - Add MCTP transport driver for MCTP-over-USB.
 
  - Enable XDP metadata support in multiple drivers.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
      - add PCIe TLP Processing Hints (TPH) support for new AMD platforms
      - support dumping RoCE queue state for debug
      - opt into instance locking
    - Intel (100G, ice, idpf):
      - ice: rework MSI-X IRQ management and distribution
      - ice: support for E830 devices
      - iavf: add support for Rx timestamping
      - iavf: opt into instance locking
    - nVidia/Mellanox:
      - mlx4: use page pool memory allocator for Rx
      - mlx5: support for one PTP device per hardware clock
      - mlx5: support for 200Gbps per-lane link modes
      - mlx5: move IPSec policy check after decryption
    - AMD/Solarflare:
      - support FW flashing via devlink
    - Cisco (enic):
      - use page pool memory allocator for Rx
      - enable 32, 64 byte CQEs
      - get max rx/tx ring size from the device
    - Meta (fbnic):
      - support flow steering and RSS configuration
      - report queue stats
      - support TCP segmentation
      - support IRQ coalescing
      - support ring size configuration
    - Marvell/Cavium:
      - support AF_XDP
    - Wangxun:
      - support for PTP clock and timestamping
    - Huawei (hibmcge):
      - checksum offload
      - add more statistics
 
  - Ethernet virtual:
    - VirtIO net:
      - aggressively suppress Tx completions, improve perf by 96% with
        1 CPU and 55% with 2 CPUs
      - expose NAPI to IRQ mapping and persist NAPI settings
    - Google (gve):
      - support XDP in DQO RDA Queue Format
      - opt into instance locking
    - Microsoft vNIC:
      - support BIG TCP
 
  - Ethernet NICs consumer, and embedded:
    - Synopsys (stmmac):
      - cleanup Tx and Tx clock setting and other link-focused cleanups
      - enable SGMII and 2500BASEX mode switching for Intel platforms
      - support Sophgo SG2044
    - Broadcom switches (b53):
      - support for BCM53101
    - TI:
      - iep: add perout configuration support
      - icssg: support XDP
    - Cadence (macb):
      - implement BQL
    - Xilinx (axinet):
      - support dynamic IRQ moderation and changing coalescing at runtime
      - implement BQL
      - report standard stats
    - MediaTek:
      - support phylink managed EEE
    - Intel:
      - igc: don't restart the interface on every XDP program change
    - RealTek (r8169):
      - support reading registers of internal PHYs directly
      - increase max jumbo packet size on RTL8125/RTL8126
    - Airoha:
      - support for RISC-V NPU packet processing unit
      - enable scatter-gather and support MTU up to 9kB
    - Tehuti (tn40xx):
      - support cards with TN4010 MAC and an Aquantia AQR105 PHY
 
  - Ethernet PHYs:
    - support for TJA1102S, TJA1121
    - dp83tg720: add randomized polling intervals for link detection
    - dp83822: support changing the transmit amplitude voltage
    - support for LEDs on 88q2xxx
 
  - CAN:
    - canxl: support Remote Request Substitution bit access
    - flexcan: add S32G2/S32G3 SoC
 
  - WiFi:
    - remove cooked monitor support
    - strict mode for better AP testing
    - basic EPCS support
    - OMI RX bandwidth reduction support
    - batman-adv: add support for jumbo frames
 
  - WiFi drivers:
    - RealTek (rtw88):
      - support RTL8814AE and RTL8814AU
    - RealTek (rtw89):
      - switch using wiphy_lock and wiphy_work
      - add BB context to manipulate two PHY as preparation of MLO
      - improve BT-coexistence mechanism to play A2DP smoothly
    - Intel (iwlwifi):
      - add new iwlmld sub-driver for latest HW/FW combinations
    - MediaTek (mt76):
      - preparation for mt7996 Multi-Link Operation (MLO) support
    - Qualcomm/Atheros (ath12k):
      - continued work on MLO
    - Silabs (wfx):
      - Wake-on-WLAN support
 
  - Bluetooth:
    - add support for skb TX SND/COMPLETION timestamping
    - hci_core: enable buffer flow control for SCO/eSCO
    - coredump: log devcd dumps into the monitor
 
  - Bluetooth drivers:
    - intel: add support to configure TX power
    - nxp: handle bootloader error during cmd5 and cmd7
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfkLC8ACgkQMUZtbf5S
 Irsb5g/+L7oKOf0ALbaV9kxFsoz8AymZfAW9i/27F07omGJGpks8oX6j6rQLgIRO
 OQOFcp7XEdDh1+jh82gHVuPrw2/6lchLtW8ARtzdiQKFr5DRjrsbtua6GRc8iBqA
 DIRCBFoV2HuMkF39Vr09HMa9AZAT7QR2RLsRGpSq8E8Z8xxKz0X7oujs10PFpMTE
 IVKhTrVrk+NDot/IU2hzVpnpup+0ld+T2/ZaBklJGcU8uDffImsqNepHRyCG5UC3
 xz74Ju23MAj24Gct+og0yFUooF+lUltKyVm0FYCDCY3bASTwgY01NR3kEH/0NQvM
 cywLzd/ngHm/SMD2ggVAHkjZUieiIVHdaZ53dgjDeBOQoVP6p0dgUK7EumXX8Mx4
 8ReR2UiGoYRPaq9c4o+IjG4K027MwVK2p+mF1a6MLa+20XcyMbev8FIRbbHtC/V4
 z5/FsOAxcuICWkA1hU9bODrrGzIqemmdRgKG8sGuTJCt/kYGAn72/TCATGNSaCJ0
 00n2jN1aepa7wtywHJ5MhVzxN9iQX7+geUHXz0BI+lK4e1Pmk+vjGksymb9ai2fk
 eQAUV9ekub6q68/J16scD7XeOUM37bTLiMBQeIF8UtZBOJscKiS71zn9QP9Twwxv
 P2pm01RDZUI+z5ZX3hc12Pm1vjRHaAh9S1JpAw/pTOVlQ+mAJEM=
 =XY0S
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Continue Netlink conversions to per-namespace RTNL lock
     (IPv4 routing, routing rules, routing next hops, ARP ioctls)

   - Continue extending the use of netdev instance locks. As a driver
     opt-in protect queue operations and (in due course) ethtool
     operations with the instance lock and not RTNL lock.

   - Support collecting TCP timestamps (data submitted, sent, acked) in
     BPF, allowing for transparent (to the application) and lower
     overhead tracking of TCP RPC performance.

   - Tweak existing networking Rx zero-copy infra to support zero-copy
     Rx via io_uring.

   - Optimize MPTCP performance in single subflow mode by 29%.

   - Enable GRO on packets which went thru XDP CPU redirect (were queued
     for processing on a different CPU). Improving TCP stream
     performance up to 2x.

   - Improve performance of contended connect() by 200% by searching for
     an available 4-tuple under RCU rather than a spin lock. Bring an
     additional 229% improvement by tweaking hash distribution.

   - Avoid unconditionally touching sk_tsflags on RX, improving
     performance under UDP flood by as much as 10%.

   - Avoid skb_clone() dance in ping_rcv() to improve performance under
     ping flood.

   - Avoid FIB lookup in netfilter if socket is available, 20% perf win.

   - Rework network device creation (in-kernel) API to more clearly
     identify network namespaces and their roles. There are up to 4
     namespace roles but we used to have just 2 netns pointer arguments,
     interpreted differently based on context.

   - Use sysfs_break_active_protection() instead of trylock to avoid
     deadlocks between unregistering objects and sysfs access.

   - Add a new sysctl and sockopt for capping max retransmit timeout in
     TCP.

   - Support masking port and DSCP in routing rule matches.

   - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.

   - Support specifying at what time packet should be sent on AF_XDP
     sockets.

   - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
     users.

   - Add Netlink YAML spec for WiFi (nl80211) and conntrack.

   - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
     which only need to be exported when IPv6 support is built as a
     module.

   - Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
     normal bridging.

   - Allow users to specify source port range for GENEVE tunnels.

   - netconsole: allow attaching kernel release, CPU ID and task name to
     messages as metadata

  Driver API:

   - Continue rework / fixing of Energy Efficient Ethernet (EEE) across
     the SW layers. Delegate the responsibilities to phylink where
     possible. Improve its handling in phylib.

   - Support symmetric OR-XOR RSS hashing algorithm.

   - Support tracking and preserving IRQ affinity by NAPI itself.

   - Support loopback mode speed selection for interface selftests.

  Device drivers:

   - Remove the IBM LCS driver for s390

   - Remove the sb1000 cable modem driver

   - Add support for SFP module access over SMBus

   - Add MCTP transport driver for MCTP-over-USB

   - Enable XDP metadata support in multiple drivers

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - add PCIe TLP Processing Hints (TPH) support for new AMD
           platforms
         - support dumping RoCE queue state for debug
         - opt into instance locking
      - Intel (100G, ice, idpf):
         - ice: rework MSI-X IRQ management and distribution
         - ice: support for E830 devices
         - iavf: add support for Rx timestamping
         - iavf: opt into instance locking
      - nVidia/Mellanox:
         - mlx4: use page pool memory allocator for Rx
         - mlx5: support for one PTP device per hardware clock
         - mlx5: support for 200Gbps per-lane link modes
         - mlx5: move IPSec policy check after decryption
      - AMD/Solarflare:
         - support FW flashing via devlink
      - Cisco (enic):
         - use page pool memory allocator for Rx
         - enable 32, 64 byte CQEs
         - get max rx/tx ring size from the device
      - Meta (fbnic):
         - support flow steering and RSS configuration
         - report queue stats
         - support TCP segmentation
         - support IRQ coalescing
         - support ring size configuration
      - Marvell/Cavium:
         - support AF_XDP
      - Wangxun:
         - support for PTP clock and timestamping
      - Huawei (hibmcge):
         - checksum offload
         - add more statistics

   - Ethernet virtual:
      - VirtIO net:
         - aggressively suppress Tx completions, improve perf by 96%
           with 1 CPU and 55% with 2 CPUs
         - expose NAPI to IRQ mapping and persist NAPI settings
      - Google (gve):
         - support XDP in DQO RDA Queue Format
         - opt into instance locking
      - Microsoft vNIC:
         - support BIG TCP

   - Ethernet NICs consumer, and embedded:
      - Synopsys (stmmac):
         - cleanup Tx and Tx clock setting and other link-focused
           cleanups
         - enable SGMII and 2500BASEX mode switching for Intel platforms
         - support Sophgo SG2044
      - Broadcom switches (b53):
         - support for BCM53101
      - TI:
         - iep: add perout configuration support
         - icssg: support XDP
      - Cadence (macb):
         - implement BQL
      - Xilinx (axinet):
         - support dynamic IRQ moderation and changing coalescing at
           runtime
         - implement BQL
         - report standard stats
      - MediaTek:
         - support phylink managed EEE
      - Intel:
         - igc: don't restart the interface on every XDP program change
      - RealTek (r8169):
         - support reading registers of internal PHYs directly
         - increase max jumbo packet size on RTL8125/RTL8126
      - Airoha:
         - support for RISC-V NPU packet processing unit
         - enable scatter-gather and support MTU up to 9kB
      - Tehuti (tn40xx):
         - support cards with TN4010 MAC and an Aquantia AQR105 PHY

   - Ethernet PHYs:
      - support for TJA1102S, TJA1121
      - dp83tg720: add randomized polling intervals for link detection
      - dp83822: support changing the transmit amplitude voltage
      - support for LEDs on 88q2xxx

   - CAN:
      - canxl: support Remote Request Substitution bit access
      - flexcan: add S32G2/S32G3 SoC

   - WiFi:
      - remove cooked monitor support
      - strict mode for better AP testing
      - basic EPCS support
      - OMI RX bandwidth reduction support
      - batman-adv: add support for jumbo frames

   - WiFi drivers:
      - RealTek (rtw88):
         - support RTL8814AE and RTL8814AU
      - RealTek (rtw89):
         - switch using wiphy_lock and wiphy_work
         - add BB context to manipulate two PHY as preparation of MLO
         - improve BT-coexistence mechanism to play A2DP smoothly
      - Intel (iwlwifi):
         - add new iwlmld sub-driver for latest HW/FW combinations
      - MediaTek (mt76):
         - preparation for mt7996 Multi-Link Operation (MLO) support
      - Qualcomm/Atheros (ath12k):
         - continued work on MLO
      - Silabs (wfx):
         - Wake-on-WLAN support

   - Bluetooth:
      - add support for skb TX SND/COMPLETION timestamping
      - hci_core: enable buffer flow control for SCO/eSCO
      - coredump: log devcd dumps into the monitor

   - Bluetooth drivers:
      - intel: add support to configure TX power
      - nxp: handle bootloader error during cmd5 and cmd7"

* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
  unix: fix up for "apparmor: add fine grained af_unix mediation"
  mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
  net: usb: asix: ax88772: Increase phy_name size
  net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
  net: libwx: fix Tx L4 checksum
  net: libwx: fix Tx descriptor content for some tunnel packets
  atm: Fix NULL pointer dereference
  net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
  net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
  net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
  net: phy: aquantia: add essential functions to aqr105 driver
  net: phy: aquantia: search for firmware-name in fwnode
  net: phy: aquantia: add probe function to aqr105 for firmware loading
  net: phy: Add swnode support to mdiobus_scan
  gve: add XDP DROP and PASS support for DQ
  gve: update XDP allocation path support RX buffer posting
  gve: merge packet buffer size fields
  gve: update GQ RX to use buf_size
  gve: introduce config-based allocation for XDP
  gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
  ...
2025-03-26 21:48:21 -07:00
Linus Torvalds 9b960d8cd6 for-6.15/block-20250322
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfe8BkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvTqD/4pOeGi/QfLyocn4TcJcidRGZAvBxecTVuM
 upeyr+dCyCi9Wk+EJKeAFooGe15upzxDxKj06HhCixaLx4etDK78uGV4FMM1Z4oa
 2dtchz1Zd0HyBPgQIUY8OuOgbS7tstMS/KdvL+gr5IjfapeTF+54WVLCD8eVyvO/
 vUIppgJBhrqy2qui4xF2lw4t2COt+/PqinGQuYALn4V4Po9NWA7lSh3ZI4F/byj1
 v68jXyt2fqCAyxwkzRDv4GxhN8c6W+TPJpzivrEAuSkLacovESKztinOrafrBnLR
 zdyO4n0V0yGOXbAcxRbADVA4HUsqhLl4JRnvE5P5zIaD7rkE0UqggF7vrSeCvVA1
 hsi1BhkAMNimKX7CZMnT3dJpxRQj1eDJxpwUAusLHWjMyQbNFhV7WAtthMtVJon8
 lAS4e5+xzjqKhF15GpVg5Lzy8SAwdqgNXwwq2zbM8OaPKG0FpajG8DXAqqcj4fpy
 WXnwg72KZDmRcSNJhVZK6B9xSAwIMXPgH4ClCMP9/xlw8EDpM38MDmzrs35TAVtI
 HGE3Qv9CjFjVj/OG3el+bTGIQJFVgYEVPV5TYfNCpKoxpj5cLn5OQY5u6MJawtgK
 HeDgKv3jw3lHatDALMVfwJqqVlUht0R6SIxtP9WHV+CcFrqN1LJKmdhDQbm7b4XK
 EbbawIsdxw==
 =Ci5m
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - Fixes for integrity handling

 - NVMe pull request via Keith:
      - Secure concatenation for TCP transport (Hannes)
      - Multipath sysfs visibility (Nilay)
      - Various cleanups (Qasim, Baruch, Wang, Chen, Mike, Damien, Li)
      - Correct use of 64-bit BARs for pci-epf target (Niklas)
      - Socket fix for selinux when used in containers (Peijie)

 - MD pull request via Yu:
      - fix recovery can preempt resync (Li Nan)
      - fix md-bitmap IO limit (Su Yue)
      - fix raid10 discard with REQ_NOWAIT (Xiao Ni)
      - fix raid1 memory leak (Zheng Qixing)
      - fix mddev uaf (Yu Kuai)
      - fix raid1,raid10 IO flags (Yu Kuai)
      - some refactor and cleanup (Yu Kuai)

 - Series cleaning up and fixing bugs in the bad block handling code

 - Improve support for write failure simulation in null_blk

 - Various lock ordering fixes

 - Fixes for locking for debugfs attributes

 - Various ublk related fixes and improvements

 - Cleanups for blk-rq-qos wait handling

 - blk-throttle fixes

 - Fixes for loop dio and sync handling

 - Fixes and cleanups for the auto-PI code

 - Block side support for hardware encryption keys in blk-crypto

 - Various cleanups and fixes

* tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux: (105 commits)
  nvmet: replace max(a, min(b, c)) by clamp(val, lo, hi)
  nvme-tcp: fix selinux denied when calling sock_sendmsg
  nvmet: pci-epf: Always configure BAR0 as 64-bit
  nvmet: Remove duplicate uuid_copy
  nvme: zns: Simplify nvme_zone_parse_entry()
  nvmet: pci-epf: Remove redundant 'flush_workqueue()' calls
  nvmet-fc: Remove unused functions
  nvme-pci: remove stale comment
  nvme-fc: Utilise min3() to simplify queue count calculation
  nvme-multipath: Add visibility for queue-depth io-policy
  nvme-multipath: Add visibility for numa io-policy
  nvme-multipath: Add visibility for round-robin io-policy
  nvmet: add tls_concat and tls_key debugfs entries
  nvmet-tcp: support secure channel concatenation
  nvmet: Add 'sq' argument to alloc_ctrl_args
  nvme-fabrics: reset admin connection for secure concatenation
  nvme-tcp: request secure channel concatenation
  nvme-keyring: add nvme_tls_psk_refresh()
  nvme: add nvme_auth_derive_tls_psk()
  nvme: add nvme_auth_generate_digest()
  ...
2025-03-26 18:08:55 -07:00
Linus Torvalds 91928e0d3c for-6.15/io_uring-20250322
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmfe8DcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprzNEADPOOfi05F0f4KOMLYOiRlB+D9yGtnbcu8k
 foPATU++L5fFDUpjAGobi38AXqHtF/Bx0CjCkPCvy4GjygcszYDY81FQmRlSFdeH
 CnYV/bm5myyupc8xANcWGJYpxf/ZAw7ugOAgcR3Xxk/qXdD5hYJ1jG4477o8v4IJ
 jyzE4vQCnWBogIoqpWTQyY3FSP5fdjeQlYdFGIFCx68rFTO9zEpuHhFr2OMfJtSu
 y1dLkeqbu1BZ0fhgRog9k9ZWk0uRQJI7d7EUXf9iwXOfkUyzCaV+nHvNGRdhJ/XO
 zj/qePw4lFyCgc+kuIejjIaPf/g84paCzshz/PIxOZTsvo6X78zj72RPM0Makhga
 amgtEqegWHtKSya1zew57oDwqwaIe6b+56PtNSP7K8IrwcJLheszoPKJIwiiynIL
 ZqvWseqYQXs58S9qZzSZlyEpwLVFcvyNt7Z/q7b56vhvXHabxOccPkUHt+UscVoQ
 qpG0R9+zmIiDfzY6Vfu7JyH2Uspdq3WjGYiaoVUko1rJXY+HFPd3be9pIbckyaA8
 ZQHIxB+h8IlxhpQ8HJKQpZhtYiluxEu6wDdvtpwC0KiGp5g4Q5eg9SkVyvETIEN8
 yW2tHqMAJJtJjDAPwJOV+GIyWQMIexzLmMP7Fq8R5VIoAox+Xn3flgNNEnMZdl/r
 kTc/hRjHvQ==
 =VEQr
 -----END PGP SIGNATURE-----

Merge tag 'for-6.15/io_uring-20250322' of git://git.kernel.dk/linux

Pull io_uring updates from Jens Axboe:
 "This is the first of the io_uring pull requests for the 6.15 merge
  window, there will be others once the net tree has gone in. This
  contains:

   - Cleanup and unification of cancelation handling across various
     request types.

   - Improvement for bundles, supporting them both for incrementally
     consumed buffers, and for non-multishot requests.

   - Enable toggling of using iowait while waiting on io_uring events or
     not. Unfortunately this is still tied with CPU frequency boosting
     on short waits, as the scheduler side has not been very receptive
     to splitting the (useless) iowait stat from the cpufreq implied
     boost.

   - Add support for kbuf nodes, enabling zero-copy support for the ublk
     block driver.

   - Various cleanups for resource node handling.

   - Series greatly cleaning up the legacy provided (non-ring based)
     buffers. For years, we've been pushing the ring provided buffers as
     the way to go, and that is what people have been using. Reduce the
     complexity and code associated with legacy provided buffers.

   - Series cleaning up the compat handling.

   - Series improving and cleaning up the recvmsg/sendmsg iovec and msg
     handling.

   - Series of cleanups for io-wq.

   - Start adding a bunch of selftests. The liburing repository
     generally carries feature and regression tests for everything, but
     at least for ublk initially, we'll try and go the route of having
     it in selftests as well. We'll see how this goes, might decide to
     migrate more tests this way in the future.

   - Various little cleanups and fixes"

* tag 'for-6.15/io_uring-20250322' of git://git.kernel.dk/linux: (108 commits)
  selftests: ublk: add stripe target
  selftests: ublk: simplify loop io completion
  selftests: ublk: enable zero copy for null target
  selftests: ublk: prepare for supporting stripe target
  selftests: ublk: move common code into common.c
  selftests: ublk: increase max buffer size to 1MB
  selftests: ublk: add single sqe allocator helper
  selftests: ublk: add generic_01 for verifying sequential IO order
  selftests: ublk: fix starting ublk device
  io_uring: enable toggle of iowait usage when waiting on CQEs
  selftests: ublk: fix write cache implementation
  selftests: ublk: add variable for user to not show test result
  selftests: ublk: don't show `modprobe` failure
  selftests: ublk: add one dependency header
  io_uring/kbuf: enable bundles for incrementally consumed buffers
  Revert "io_uring/rsrc: simplify the bvec iter count calculation"
  selftests: ublk: improve test usability
  selftests: ublk: add stress test for covering IO vs. killing ublk server
  selftests: ublk: add one stress test for covering IO vs. removing device
  selftests: ublk: load/unload ublk_drv when preparing & cleaning up tests
  ...
2025-03-26 17:56:00 -07:00
Mickaël Salaün ead9079f75
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF
Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF for the case of sandboxer
tools, init systems, or runtime containers launching programs sandboxing
themselves in an inconsistent way.  Setting this flag should only
depends on runtime configuration (i.e. not hardcoded).

We don't create a new ruleset's option because this should not be part
of the security policy: only the task that enforces the policy (not the
one that create it) knows if itself or its children may request denied
actions.

This is the first and only flag that can be set without actually
restricting the caller (i.e. without providing a ruleset).

Extend struct landlock_cred_security with a u8 log_subdomains_off.
struct landlock_file_security is still 16 bytes.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Closes: https://github.com/landlock-lsm/linux/issues/3
Link: https://lore.kernel.org/r/20250320190717.2287696-19-mic@digikod.net
[mic: Fix comment]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:43 +01:00
Mickaël Salaün 12bfcda73a
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags
Most of the time we want to log denied access because they should not
happen and such information helps diagnose issues.  However, when
sandboxing processes that we know will try to access denied resources
(e.g. unknown, bogus, or malicious binary), we might want to not log
related access requests that might fill up logs.

By default, denied requests are logged until the task call execve(2).

If the LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF flag is set, denied
requests will not be logged for the same executed file.

If the LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON flag is set, denied
requests from after an execve(2) call will be logged.

The rationale is that a program should know its own behavior, but not
necessarily the behavior of other programs.

Because LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF is set for a specific
Landlock domain, it makes it possible to selectively mask some access
requests that would be logged by a parent domain, which might be handy
for unprivileged processes to limit logs.  However, system
administrators should still use the audit filtering mechanism.  There is
intentionally no audit nor sysctl configuration to re-enable these logs.
This is delegated to the user space program.

Increment the Landlock ABI version to reflect this interface change.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-18-mic@digikod.net
[mic: Rename variables and fix __maybe_unused]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:42 +01:00
Mickaël Salaün 1d636984e0
landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status
Asynchronously log domain information when it first denies an access.
This minimize the amount of generated logs, which makes it possible to
always log denials for the current execution since they should not
happen.  These records are identified with the new AUDIT_LANDLOCK_DOMAIN
type.

The AUDIT_LANDLOCK_DOMAIN message contains:
- the "domain" ID which is described;
- the "status" which can either be "allocated" or "deallocated";
- the "mode" which is for now only "enforcing";
- for the "allocated" status, a minimal set of properties to easily
  identify the task that loaded the domain's policy with
  landlock_restrict_self(2): "pid", "uid", executable path ("exe"), and
  command line ("comm");
- for the "deallocated" state, the number of "denials" accounted to this
  domain, which is at least 1.

This requires each domain to save these task properties at creation
time in the new struct landlock_details.  A reference to the PID is kept
for the lifetime of the domain to avoid race conditions when
investigating the related task.  The executable path is resolved and
stored to not keep a reference to the filesystem and block related
actions.  All these metadata are stored for the lifetime of the related
domain and should then be minimal.  The required memory is not accounted
to the task calling landlock_restrict_self(2) contrary to most other
Landlock allocations (see related comment).

The AUDIT_LANDLOCK_DOMAIN record follows the first AUDIT_LANDLOCK_ACCESS
record for the same domain, which is always followed by AUDIT_SYSCALL
and AUDIT_PROCTITLE.  This is in line with the audit logic to first
record the cause of an event, and then add context with other types of
record.

Audit event sample for a first denial:

  type=LANDLOCK_ACCESS msg=audit(1732186800.349:44): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd"
  type=LANDLOCK_DOMAIN msg=audit(1732186800.349:44): domain=195ba459b status=allocated mode=enforcing pid=300 uid=0 exe="/root/sandboxer" comm="sandboxer"
  type=SYSCALL msg=audit(1732186800.349:44): arch=c000003e syscall=101 success=no [...] pid=300 auid=0

Audit event sample for a following denial:

  type=LANDLOCK_ACCESS msg=audit(1732186800.372:45): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd"
  type=SYSCALL msg=audit(1732186800.372:45): arch=c000003e syscall=101 success=no [...] pid=300 auid=0

Log domain deletion with the "deallocated" state when a domain was
previously logged.  This makes it possible for log parsers to free
potential resources when a domain ID will never show again.

The number of denied access requests is useful to easily check how many
access requests a domain blocked and potentially if some of them are
missing in logs because of audit rate limiting, audit rules, or Landlock
log configuration flags (see following commit).

Audit event sample for a deletion of a domain that denied something:

  type=LANDLOCK_DOMAIN msg=audit(1732186800.393:46): domain=195ba459b status=deallocated denials=2

Cc: Günther Noack <gnoack@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-11-mic@digikod.net
[mic: Update comment and GFP flag for landlock_log_drop_domain()]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:38 +01:00
Mickaël Salaün 33e65b0d3a
landlock: Add AUDIT_LANDLOCK_ACCESS and log ptrace denials
Add a new AUDIT_LANDLOCK_ACCESS record type dedicated to an access
request denied by a Landlock domain.  AUDIT_LANDLOCK_ACCESS indicates
that something unexpected happened.

For now, only denied access are logged, which means that any
AUDIT_LANDLOCK_ACCESS record is always followed by a SYSCALL record with
"success=no".  However, log parsers should check this syscall property
because this is the only sign that a request was denied.  Indeed, we
could have "success=yes" if Landlock would support a "permissive" mode.
We could also add a new field to AUDIT_LANDLOCK_DOMAIN for this mode
(see following commit).

By default, the only logged access requests are those coming from the
same executed program that enforced the Landlock restriction on itself.
In other words, no audit record are created for a task after it called
execve(2).  This is required to avoid log spam because programs may only
be aware of their own restrictions, but not the inherited ones.

Following commits will allow to conditionally generate
AUDIT_LANDLOCK_ACCESS records according to dedicated
landlock_restrict_self(2)'s flags.

The AUDIT_LANDLOCK_ACCESS message contains:
- the "domain" ID restricting the action on an object,
- the "blockers" that are missing to allow the requested access,
- a set of fields identifying the related object (e.g. task identified
  with "opid" and "ocomm").

The blockers are implicit restrictions (e.g. ptrace), or explicit access
rights (e.g. filesystem), or explicit scopes (e.g. signal).  This field
contains a list of at least one element, each separated with a comma.

The initial blocker is "ptrace", which describe all implicit Landlock
restrictions related to ptrace (e.g. deny tracing of tasks outside a
sandbox).

Add audit support to ptrace_access_check and ptrace_traceme hooks.  For
the ptrace_access_check case, we log the current/parent domain and the
child task.  For the ptrace_traceme case, we log the parent domain and
the current/child task.  Indeed, the requester and the target are the
current task, but the action would be performed by the parent task.

Audit event sample:

  type=LANDLOCK_ACCESS msg=audit(1729738800.349:44): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd"
  type=SYSCALL msg=audit(1729738800.349:44): arch=c000003e syscall=101 success=no [...] pid=300 auid=0

A following commit adds user documentation.

Add KUnit tests to check reading of domain ID relative to layer level.

The quick return for non-landlocked tasks is moved from task_ptrace() to
each LSM hooks.

It is not useful to inline the audit_enabled check because other
computation are performed by landlock_log_denial().

Use scoped guards for RCU read-side critical sections.

Cc: Günther Noack <gnoack@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-10-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-26 13:59:38 +01:00
Niklas Cassel 08818c6d7f
misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
For PCITEST_MSI we really want to set PCITEST_SET_IRQTYPE explicitly
to PCITEST_IRQ_TYPE_MSI, since we want to test if MSI works.

For PCITEST_MSIX we really want to set PCITEST_SET_IRQTYPE explicitly
to PCITEST_IRQ_TYPE_MSIX, since we want to test if MSI works.

For PCITEST_LEGACY_IRQ we really want to set PCITEST_SET_IRQTYPE
explicitly to PCITEST_IRQ_TYPE_INTX, since we want to test if INTx
works.

However, for PCITEST_WRITE, PCITEST_READ, PCITEST_COPY, we really don't
care which IRQ type that is used, we just want to use a IRQ type that is
supported by the EPC.

The old behavior was to always use MSI for PCITEST_WRITE, PCITEST_READ,
PCITEST_COPY, was to always set IRQ type to MSI before doing the actual
test, however, there are EPC drivers that do not support MSI.

Add a new PCITEST_IRQ_TYPE_AUTO, that will use the CAPS register to see
which IRQ types the endpoint supports, and use one of the supported IRQ
types.

For backwards compatibility, if the endpoint does not expose any supported
IRQ type in the CAPS register, simply fallback to using MSI, as it was
unconditionally done before.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250310111016.859445-16-cassel@kernel.org
2025-03-26 06:11:54 +00:00
Linus Torvalds 1e26c5e28c [GIT PULL for v6.15] media updates
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmfbCbgACgkQCF8+vY7k
 4RX20A/+Lec4h7TqDC3ctPHVvZ6dN+xMjqx5RnhsJ2yo7NqrbgrZ1HAC6nNkUCAP
 ZbhVUBEVYN3pxCiA1Qj6YuainQkWW5qPIB1ACQ/spdXacluQaPuub03LXLzbn6Qh
 inojtO1v04q2LPtnl7Lv1F0aUbXQK2JPMOlByrgX/XdFYq3B91uV5Z7SeBnsSshW
 usOdh3Dv1QmIlHvlSLFlAVPAE1PdfvDmV3XmhSy4HZM+zhQjYia563F+9lj5tUe9
 F7mX7fGlh9AL2uiucH7GlW3U6+SNc0JwNr3Ra9LdBvqksU8TYp1Dt+lvqpscTyuY
 DPMZS5i/YfZPFukPNGMf7RKKP9/gaKWza684PTq5XsASUAiGwpsiyPHM2rJYXf9u
 kAzLJH7vk7sLdssgjFONwOLJlD93EXclT7myDpNCbAyD7tni7iTfGyxmNYFFLc/w
 NzqfowZfHSxYwDYzoXUGcU2lksMqWrHF+8oO2de0NUxhrLjawcGj3acCe1R4zU0W
 CoKDCI8YXoBI2YNrF1Nvnca/2qkPXi1IKyVdiDGMv4aeuja2hB8Z2IKssSP8Jjlp
 ui97q2MR/WwYAKkNTyH8F0kV6x6E5FqVbVAm6wp+d4uzi2QSGPFD8kuXwTmohYGa
 ZbVXBtjM3ZxO5rb/LsCPAPy9USR3J3Jkt5rVfaRo+EusU+8Iqig=
 =RsqI
 -----END PGP SIGNATURE-----

Merge tag 'media/v6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - platform: synopsys: hdmirx: Fix 64-bit division for 32-bit targets

 - vim2m: print device name after registering device

 - Synopsys DesignWare HDMI RX Driver and various fixes

 - cec/printk fixes and the removal of the vidioc_g/s_ctrl and
   vidioc_queryctrl callbacks

 - AVerMedia H789-C PCIe support and rc-core structs padding

 - Several camera sensor patches

 - uvcvideo improvements

 - visl: Fix ERANGE error when setting enum controls

 - codec fixes

 - V4L2 camera sensor patches mostly

 - chips-media: wave5: Fixes

 - Add SDM670 camera subsystem

 - Qualcomm iris video decoder driver

 - dt-bindings: update clocks for sc7280-camss

 - various fixes and enhancements

* tag 'media/v6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (264 commits)
  media: pci: mgb4: include linux/errno.h
  media: synopsys: hdmirx: Fix signedness bug in hdmirx_parse_dt()
  media: platform: synopsys: hdmirx: Fix 64-bit division for 32-bit targets
  media: vim2m: print device name after registering device
  media: vivid: Introduce VIDEO_VIVID_OSD
  media: vivid: Move all fb_info references into vivid-osd
  media: platform: synopsys: hdmirx: Optimize struct snps_hdmirx_dev
  media: platform: synopsys: hdmirx: Remove unused HDMI audio CODEC relics
  media: platform: synopsys: hdmirx: Remove duplicated header inclusion
  media: qcom: Clean up Kconfig dependencies
  media: dvb-frontends: tda10048: Make the range of z explicit.
  media: platform: stm32: Add check for clk_enable()
  media: xilinx-tpg: fix double put in xtpg_parse_of()
  media: siano: Fix error handling in smsdvb_module_init()
  media: c8sectpfe: Call of_node_put(i2c_bus) only once in c8sectpfe_probe()
  media: i2c: tda1997x: Call of_node_put(ep) only once in tda1997x_parse_dt()
  dt-bindings: media: mediatek,vcodec: Revise description
  dt-bindings: media: mediatek,jpeg: Relax IOMMU max item count
  media: v4l2-dv-timings: prevent possible overflow in v4l2_detect_gtf()
  media: rockchip: rga: fix rga offset lookup
  ...
2025-03-25 21:00:31 -07:00
Linus Torvalds a5b3d8660b hyperv-next for 6.15
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmfhlLATHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXgchCADOz33rSm4G4w4r0qT05dTDi/lZkEdK
 64dQq322XXP/C9FfR66d30243gsAmuM5a0SvzFHLXAOu6yqM270Xehd/Rud+Um2s
 lSVnc0Ux0AWBgksqFd0t577aN7zmJEukosEYO5lBNop+zOcadrm3S6Th/AoL2h/D
 yphPkhH13bsCK+Wll/eBOQLIhC9iA0konYbBLuEQ5MqvUbrzc6Rmb5gxsHHZKOqg
 vLjkrYR/d3s2gIpKxiFp0RwvzGyffZEHxvU/YF3hTenPMlTlnXWbyspBSTVmWggP
 13IFLzqxDdW9RgUnGB4xRc424AC1LKqEr42QPQE7zGvl2jdJriA2Q1LT
 =BXqj
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - Add support for running as the root partition in Hyper-V (Microsoft
   Hypervisor) by exposing /dev/mshv (Nuno and various people)

 - Add support for CPU offlining in Hyper-V (Hamza Mahfooz)

 - Misc fixes and cleanups (Roman Kisel, Tianyu Lan, Wei Liu, Michael
   Kelley, Thorsten Blum)

* tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits)
  x86/hyperv: fix an indentation issue in mshyperv.h
  x86/hyperv: Add comments about hv_vpset and var size hypercall input args
  Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs
  hyperv: Add definitions for root partition driver to hv headers
  x86: hyperv: Add mshv_handler() irq handler and setup function
  Drivers: hv: Introduce per-cpu event ring tail
  Drivers: hv: Export some functions for use by root partition module
  acpi: numa: Export node_to_pxm()
  hyperv: Introduce hv_recommend_using_aeoi()
  arm64/hyperv: Add some missing functions to arm64
  x86/mshyperv: Add support for extended Hyper-V features
  hyperv: Log hypercall status codes as strings
  x86/hyperv: Fix check of return value from snp_set_vmsa()
  x86/hyperv: Add VTL mode callback for restarting the system
  x86/hyperv: Add VTL mode emergency restart callback
  hyperv: Remove unused union and structs
  hyperv: Add CONFIG_MSHV_ROOT to gate root partition support
  hyperv: Change hv_root_partition into a function
  hyperv: Convert hypercall statuses to linux error codes
  drivers/hv: add CPU offlining support
  ...
2025-03-25 14:47:04 -07:00
Linus Torvalds edb0e8f6e2 ARM:
* Nested virtualization support for VGICv3, giving the nested
 hypervisor control of the VGIC hardware when running an L2 VM
 
 * Removal of 'late' nested virtualization feature register masking,
   making the supported feature set directly visible to userspace
 
 * Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
   of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
 
 * Paravirtual interface for discovering the set of CPU implementations
   where a VM may run, addressing a longstanding issue of guest CPU
   errata awareness in big-little systems and cross-implementation VM
   migration
 
 * Userspace control of the registers responsible for identifying a
   particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
   allowing VMs to be migrated cross-implementation
 
 * pKVM updates, including support for tracking stage-2 page table
   allocations in the protected hypervisor in the 'SecPageTable' stat
 
 * Fixes to vPMU, ensuring that userspace updates to the vPMU after
   KVM_RUN are reflected into the backing perf events
 
 LoongArch:
 
 * Remove unnecessary header include path
 
 * Assume constant PGD during VM context switch
 
 * Add perf events support for guest VM
 
 RISC-V:
 
 * Disable the kernel perf counter during configure
 
 * KVM selftests improvements for PMU
 
 * Fix warning at the time of KVM module removal
 
 x86:
 
 * Add support for aging of SPTEs without holding mmu_lock.  Not taking mmu_lock
   allows multiple aging actions to run in parallel, and more importantly avoids
   stalling vCPUs.  This includes an implementation of per-rmap-entry locking;
   aging the gfn is done with only a per-rmap single-bin spinlock taken, whereas
   locking an rmap for write requires taking both the per-rmap spinlock and
   the mmu_lock.
 
   Note that this decreases slightly the accuracy of accessed-page information,
   because changes to the SPTE outside aging might not use atomic operations
   even if they could race against a clear of the Accessed bit.  This is
   deliberate because KVM and mm/ tolerate false positives/negatives for
   accessed information, and testing has shown that reducing the latency of
   aging is far more beneficial to overall system performance than providing
   "perfect" young/old information.
 
 * Defer runtime CPUID updates until KVM emulates a CPUID instruction, to
   coalesce updates when multiple pieces of vCPU state are changing, e.g. as
   part of a nested transition.
 
 * Fix a variety of nested emulation bugs, and add VMX support for synthesizing
   nested VM-Exit on interception (instead of injecting #UD into L2).
 
 * Drop "support" for async page faults for protected guests that do not set
   SEND_ALWAYS (i.e. that only want async page faults at CPL3)
 
 * Bring a bit of sanity to x86's VM teardown code, which has accumulated
   a lot of cruft over the years.  Particularly, destroy vCPUs before
   the MMU, despite the latter being a VM-wide operation.
 
 * Add common secure TSC infrastructure for use within SNP and in the
   future TDX
 
 * Block KVM_CAP_SYNC_REGS if guest state is protected.  It does not make
   sense to use the capability if the relevant registers are not
   available for reading or writing.
 
 * Don't take kvm->lock when iterating over vCPUs in the suspend notifier to
   fix a largely theoretical deadlock.
 
 * Use the vCPU's actual Xen PV clock information when starting the Xen timer,
   as the cached state in arch.hv_clock can be stale/bogus.
 
 * Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different
   PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend
   notifier only accounts for kvmclock, and there's no evidence that the
   flag is actually supported by Xen guests.
 
 * Clean up the per-vCPU "cache" of its reference pvclock, and instead only
   track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately
   expensive to compute, and rarely changes for modern setups).
 
 * Don't write to the Xen hypercall page on MSR writes that are initiated by
   the host (userspace or KVM) to fix a class of bugs where KVM can write to
   guest memory at unexpected times, e.g. during vCPU creation if userspace has
   set the Xen hypercall MSR index to collide with an MSR that KVM emulates.
 
 * Restrict the Xen hypercall MSR index to the unofficial synthetic range to
   reduce the set of possible collisions with MSRs that are emulated by KVM
   (collisions can still happen as KVM emulates Hyper-V MSRs, which also reside
   in the synthetic range).
 
 * Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config.
 
 * Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID
   entries when updating PV clocks; there is no guarantee PV clocks will be
   updated between TSC frequency changes and CPUID emulation, and guest reads
   of the TSC leaves should be rare, i.e. are not a hot path.
 
 x86 (Intel):
 
 * Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus
   modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1.
 
 * Pass XFD_ERR as the payload when injecting #NM, as a preparatory step
   for upcoming FRED virtualization support.
 
 * Decouple the EPT entry RWX protection bit macros from the EPT Violation
   bits, both as a general cleanup and in anticipation of adding support for
   emulating Mode-Based Execution Control (MBEC).
 
 * Reject KVM_RUN if userspace manages to gain control and stuff invalid guest
   state while KVM is in the middle of emulating nested VM-Enter.
 
 * Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs
   in anticipation of adding sanity checks for secondary exit controls (the
   primary field is out of bits).
 
 x86 (AMD):
 
 * Ensure the PSP driver is initialized when both the PSP and KVM modules are
   built-in (the initcall framework doesn't handle dependencies).
 
 * Use long-term pins when registering encrypted memory regions, so that the
   pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to
   excessive fragmentation.
 
 * Add macros and helpers for setting GHCB return/error codes.
 
 * Add support for Idle HLT interception, which elides interception if the vCPU
   has a pending, unmasked virtual IRQ when HLT is executed.
 
 * Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical
   address.
 
 * Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g.
   because the vCPU was "destroyed" via SNP's AP Creation hypercall.
 
 * Reject SNP AP Creation if the requested SEV features for the vCPU don't
   match the VM's configured set of features.
 
 Selftests:
 
 * Fix again the Intel PMU counters test; add a data load and do CLFLUSH{OPT} on the data
   instead of executing code.  The theory is that modern Intel CPUs have
   learned new code prefetching tricks that bypass the PMU counters.
 
 * Fix a flaw in the Intel PMU counters test where it asserts that an event is
   counting correctly without actually knowing what the event counts on the
   underlying hardware.
 
 * Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and
   improve its coverage by collecting all dirty entries on each iteration.
 
 * Fix a few minor bugs related to handling of stats FDs.
 
 * Add infrastructure to make vCPU and VM stats FDs available to tests by
   default (open the FDs during VM/vCPU creation).
 
 * Relax an assertion on the number of HLT exits in the xAPIC IPI test when
   running on a CPU that supports AMD's Idle HLT (which elides interception of
   HLT if a virtual IRQ is pending and unmasked).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmfcTkEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMnQAf/cPx72hJOdNy4Qrm8M33YLXVRVV00
 yEZ8eN8TWdOclr0ltE/w/ELGh/qS4CU8pjURAk0A6lPioU+mdcTn3dPEqMDMVYom
 uOQ2lusEHw0UuSnGZSEjvZJsE/Ro2NSAsHIB6PWRqig1ZBPJzyu0frce34pMpeQH
 diwriJL9lKPAhBWXnUQ9BKoi1R0P5OLW9ahX4SOWk7cAFg4DLlDE66Nqf6nKqViw
 DwEucTiUEg5+a3d93gihdD4JNl+fb3vI2erxrMxjFjkacl0qgqRu3ei3DG0MfdHU
 wNcFSG5B1n0OECKxr80lr1Ip1KTVNNij0Ks+w6Gc6lSg9c4PptnNkfLK3A==
 =nnCN
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Nested virtualization support for VGICv3, giving the nested
     hypervisor control of the VGIC hardware when running an L2 VM

   - Removal of 'late' nested virtualization feature register masking,
     making the supported feature set directly visible to userspace

   - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
     of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers

   - Paravirtual interface for discovering the set of CPU
     implementations where a VM may run, addressing a longstanding issue
     of guest CPU errata awareness in big-little systems and
     cross-implementation VM migration

   - Userspace control of the registers responsible for identifying a
     particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
     allowing VMs to be migrated cross-implementation

   - pKVM updates, including support for tracking stage-2 page table
     allocations in the protected hypervisor in the 'SecPageTable' stat

   - Fixes to vPMU, ensuring that userspace updates to the vPMU after
     KVM_RUN are reflected into the backing perf events

  LoongArch:

   - Remove unnecessary header include path

   - Assume constant PGD during VM context switch

   - Add perf events support for guest VM

  RISC-V:

   - Disable the kernel perf counter during configure

   - KVM selftests improvements for PMU

   - Fix warning at the time of KVM module removal

  x86:

   - Add support for aging of SPTEs without holding mmu_lock.

     Not taking mmu_lock allows multiple aging actions to run in
     parallel, and more importantly avoids stalling vCPUs. This includes
     an implementation of per-rmap-entry locking; aging the gfn is done
     with only a per-rmap single-bin spinlock taken, whereas locking an
     rmap for write requires taking both the per-rmap spinlock and the
     mmu_lock.

     Note that this decreases slightly the accuracy of accessed-page
     information, because changes to the SPTE outside aging might not
     use atomic operations even if they could race against a clear of
     the Accessed bit.

     This is deliberate because KVM and mm/ tolerate false
     positives/negatives for accessed information, and testing has shown
     that reducing the latency of aging is far more beneficial to
     overall system performance than providing "perfect" young/old
     information.

   - Defer runtime CPUID updates until KVM emulates a CPUID instruction,
     to coalesce updates when multiple pieces of vCPU state are
     changing, e.g. as part of a nested transition

   - Fix a variety of nested emulation bugs, and add VMX support for
     synthesizing nested VM-Exit on interception (instead of injecting
     #UD into L2)

   - Drop "support" for async page faults for protected guests that do
     not set SEND_ALWAYS (i.e. that only want async page faults at CPL3)

   - Bring a bit of sanity to x86's VM teardown code, which has
     accumulated a lot of cruft over the years. Particularly, destroy
     vCPUs before the MMU, despite the latter being a VM-wide operation

   - Add common secure TSC infrastructure for use within SNP and in the
     future TDX

   - Block KVM_CAP_SYNC_REGS if guest state is protected. It does not
     make sense to use the capability if the relevant registers are not
     available for reading or writing

   - Don't take kvm->lock when iterating over vCPUs in the suspend
     notifier to fix a largely theoretical deadlock

   - Use the vCPU's actual Xen PV clock information when starting the
     Xen timer, as the cached state in arch.hv_clock can be stale/bogus

   - Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across
     different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as
     KVM's suspend notifier only accounts for kvmclock, and there's no
     evidence that the flag is actually supported by Xen guests

   - Clean up the per-vCPU "cache" of its reference pvclock, and instead
     only track the vCPU's TSC scaling (multipler+shift) metadata (which
     is moderately expensive to compute, and rarely changes for modern
     setups)

   - Don't write to the Xen hypercall page on MSR writes that are
     initiated by the host (userspace or KVM) to fix a class of bugs
     where KVM can write to guest memory at unexpected times, e.g.
     during vCPU creation if userspace has set the Xen hypercall MSR
     index to collide with an MSR that KVM emulates

   - Restrict the Xen hypercall MSR index to the unofficial synthetic
     range to reduce the set of possible collisions with MSRs that are
     emulated by KVM (collisions can still happen as KVM emulates
     Hyper-V MSRs, which also reside in the synthetic range)

   - Clean up and optimize KVM's handling of Xen MSR writes and
     xen_hvm_config

   - Update Xen TSC leaves during CPUID emulation instead of modifying
     the CPUID entries when updating PV clocks; there is no guarantee PV
     clocks will be updated between TSC frequency changes and CPUID
     emulation, and guest reads of the TSC leaves should be rare, i.e.
     are not a hot path

  x86 (Intel):

   - Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and
     thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1

   - Pass XFD_ERR as the payload when injecting #NM, as a preparatory
     step for upcoming FRED virtualization support

   - Decouple the EPT entry RWX protection bit macros from the EPT
     Violation bits, both as a general cleanup and in anticipation of
     adding support for emulating Mode-Based Execution Control (MBEC)

   - Reject KVM_RUN if userspace manages to gain control and stuff
     invalid guest state while KVM is in the middle of emulating nested
     VM-Enter

   - Add a macro to handle KVM's sanity checks on entry/exit VMCS
     control pairs in anticipation of adding sanity checks for secondary
     exit controls (the primary field is out of bits)

  x86 (AMD):

   - Ensure the PSP driver is initialized when both the PSP and KVM
     modules are built-in (the initcall framework doesn't handle
     dependencies)

   - Use long-term pins when registering encrypted memory regions, so
     that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and
     don't lead to excessive fragmentation

   - Add macros and helpers for setting GHCB return/error codes

   - Add support for Idle HLT interception, which elides interception if
     the vCPU has a pending, unmasked virtual IRQ when HLT is executed

   - Fix a bug in INVPCID emulation where KVM fails to check for a
     non-canonical address

   - Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is
     invalid, e.g. because the vCPU was "destroyed" via SNP's AP
     Creation hypercall

   - Reject SNP AP Creation if the requested SEV features for the vCPU
     don't match the VM's configured set of features

  Selftests:

   - Fix again the Intel PMU counters test; add a data load and do
     CLFLUSH{OPT} on the data instead of executing code. The theory is
     that modern Intel CPUs have learned new code prefetching tricks
     that bypass the PMU counters

   - Fix a flaw in the Intel PMU counters test where it asserts that an
     event is counting correctly without actually knowing what the event
     counts on the underlying hardware

   - Fix a variety of flaws, bugs, and false failures/passes
     dirty_log_test, and improve its coverage by collecting all dirty
     entries on each iteration

   - Fix a few minor bugs related to handling of stats FDs

   - Add infrastructure to make vCPU and VM stats FDs available to tests
     by default (open the FDs during VM/vCPU creation)

   - Relax an assertion on the number of HLT exits in the xAPIC IPI test
     when running on a CPU that supports AMD's Idle HLT (which elides
     interception of HLT if a virtual IRQ is pending and unmasked)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits)
  RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed
  RISC-V: KVM: Teardown riscv specific bits after kvm_exit
  LoongArch: KVM: Register perf callbacks for guest
  LoongArch: KVM: Implement arch-specific functions for guest perf
  LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel()
  LoongArch: KVM: Remove PGD saving during VM context switch
  LoongArch: KVM: Remove unnecessary header include path
  KVM: arm64: Tear down vGIC on failed vCPU creation
  KVM: arm64: PMU: Reload when resetting
  KVM: arm64: PMU: Reload when user modifies registers
  KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
  KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
  KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
  KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
  KVM: arm64: Initialize HCRX_EL2 traps in pKVM
  KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
  KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected
  KVM: x86: Add infrastructure for secure TSC
  KVM: x86: Push down setting vcpu.arch.user_set_tsc
  ...
2025-03-25 14:22:07 -07:00
Jakub Kicinski 4f74a45c6b bluetooth-next pull request for net-next:
core:
 
  - Add support for skb TX SND/COMPLETION timestamping
  - hci_core: Enable buffer flow control for SCO/eSCO
  - coredump: Log devcd dumps into the monitor
 
  drivers:
 
  - btusb: Add 2 HWIDs for MT7922
  - btusb: Fix regression in the initialization of fake Bluetooth controllers
  - btusb: Add 14 USB device IDs for Qualcomm WCN785x
  - btintel: Add support for Intel Scorpius Peak
  - btintel: Add support to configure TX power
  - btintel: Add DSBR support for ScP
  - btintel_pcie: Add device id of Whale Peak
  - btintel_pcie: Setup buffers for firmware traces
  - btintel_pcie: Read hardware exception data
  - btintel_pcie: Add support for device coredump
  - btintel_pcie: Trigger device coredump on hardware exception
  - btnxpuart: Support for controller wakeup gpio config
  - btnxpuart: Add support to set BD address
  - btnxpuart: Add correct bootloader error codes
  - btnxpuart: Handle bootloader error during cmd5 and cmd7
  - btnxpuart: Fix kernel panic during FW release
  - qca: add WCN3950 support
  - hci_qca: use the power sequencer for wcn6750
  - btmtksdio: Prevent enabling interrupts after IRQ handler removal
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmfjA3AZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKeWWD/0QZYBrNP9QSTyYTeNlYhCC
 Lw/n7n3+LhxOqIu+tOGS7UplTqR3p4WQGu2g+XV9wSu4dvLplZGn/40XtiXJA0+r
 VsXivQ4IR/Vjd8sNLLixfmuH4g4CbMSblQvECD3/5wFTeSH8T6/gyts/WV/LDYOJ
 jp5kG6HCFHMv7RaiaHdZ0Pe4c1xJ/Ek8TnrW4G/kPBaLzm+lhjRkfzxCx8cO0k0H
 mpoheUohLtUSgfLf49826t7rp3HDuX9db2hiGXQfKSrL2milwufKNaMFTmbuU2Uq
 IyAwR1CEdSsKlcpbnVNF05r94sjf8NjuBD3YWxB9OfVXq9aymJYZdGh54XT5nF3f
 ccD1WNnOoTsXDbEAnuVx+EYrJAI70e7vE4m8XbpxJcxhFoSIQCmtDoXUY199LiEG
 El7DlwouNFESmOIj6gSK7ogUEhmcQA0AJxBVpdxnGBAcQMn1hrGSb75VGg7rul8K
 HcBtf5j4gxZum2fzrufeY1+eBxfSRjNFIsnutAr53gEtidPZYlmiRyAuqQJMo2sO
 0hlpC6gQGQJ5HiTR5Jbo9u+OC1oeHHXNN5IpNrd6J65n0pqCRldDsoXCerEJsOou
 cqWzb9lfQCrjRnWRxUWOLukdYRgBH15wuORU+Z7/qVhqOr49RvARnzq7AjGSenTS
 YKb2N+NGlqpxG+vQscFJUA==
 =XCdn
 -----END PGP SIGNATURE-----

Merge tag 'for-net-next-2025-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

core:

 - Add support for skb TX SND/COMPLETION timestamping
 - hci_core: Enable buffer flow control for SCO/eSCO
 - coredump: Log devcd dumps into the monitor

 drivers:

 - btusb: Add 2 HWIDs for MT7922
 - btusb: Fix regression in the initialization of fake Bluetooth controllers
 - btusb: Add 14 USB device IDs for Qualcomm WCN785x
 - btintel: Add support for Intel Scorpius Peak
 - btintel: Add support to configure TX power
 - btintel: Add DSBR support for ScP
 - btintel_pcie: Add device id of Whale Peak
 - btintel_pcie: Setup buffers for firmware traces
 - btintel_pcie: Read hardware exception data
 - btintel_pcie: Add support for device coredump
 - btintel_pcie: Trigger device coredump on hardware exception
 - btnxpuart: Support for controller wakeup gpio config
 - btnxpuart: Add support to set BD address
 - btnxpuart: Add correct bootloader error codes
 - btnxpuart: Handle bootloader error during cmd5 and cmd7
 - btnxpuart: Fix kernel panic during FW release
 - qca: add WCN3950 support
 - hci_qca: use the power sequencer for wcn6750
 - btmtksdio: Prevent enabling interrupts after IRQ handler removal

* tag 'for-net-next-2025-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (53 commits)
  Bluetooth: MGMT: Add LL Privacy Setting
  Bluetooth: hci_event: Fix handling of HCI_EV_LE_DIRECT_ADV_REPORT
  Bluetooth: btnxpuart: Fix kernel panic during FW release
  Bluetooth: btnxpuart: Handle bootloader error during cmd5 and cmd7
  Bluetooth: btnxpuart: Add correct bootloader error codes
  t blameBluetooth: btintel: Fix leading white space
  Bluetooth: btintel: Add support to configure TX power
  Bluetooth: btmtksdio: Prevent enabling interrupts after IRQ handler removal
  Bluetooth: btmtk: Remove the resetting step before downloading the fw
  Bluetooth: SCO: add TX timestamping
  Bluetooth: L2CAP: add TX timestamping
  Bluetooth: ISO: add TX timestamping
  Bluetooth: add support for skb TX SND/COMPLETION timestamping
  net-timestamp: COMPLETION timestamp on packet tx completion
  HCI: coredump: Log devcd dumps into the monitor
  Bluetooth: HCI: Add definition of hci_rp_remote_name_req_cancel
  Bluetooth: hci_vhci: Mark Sync Flow Control as supported
  Bluetooth: hci_core: Enable buffer flow control for SCO/eSCO
  Bluetooth: btintel_pci: Fix build warning
  Bluetooth: btintel_pcie: Trigger device coredump on hardware exception
  ...
====================

Link: https://patch.msgid.link/20250325192925.2497890-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 14:00:48 -07:00
Linus Torvalds 317a76a996 Updates for the VDSO infrastructure:
- Consolidate the VDSO storage
 
     The VDSO data storage and data layout has been largely architecture
     specific for historical reasons. That increases the maintenance effort
     and causes inconsistencies over and over.
 
     There is no real technical reason for architecture specific layouts and
     implementations. The architecture specific details can easily be
     integrated into a generic layout, which also reduces the amount of
     duplicated code for managing the mappings.
 
     Convert all architectures over to a unified layout and common mapping
     infrastructure. This splits the VDSO data layout into subsystem
     specific blocks, timekeeping, random and architecture parts, which
     provides a better structure and allows to improve and update the
     functionalities without conflict and interaction.
 
   - Rework the timekeeping data storage
 
     The current implementation is designed for exposing system timekeeping
     accessors, which was good enough at the time when it was designed.
 
     PTP and Time Sensitive Networking (TSN) change that as there are
     requirements to expose independent PTP clocks, which are not related to
     system timekeeping.
 
     Replace the monolithic data storage by a structured layout, which
     allows to add support for independent PTP clocks on top while reusing
     both the data structures and the time accessor implementations.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfgSWUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYGED/0f/M8YyacAyErDYW4ufW+zh2sUidSf
 GVlK0Jn5BMljOoye+y2XfTxuvvXxEDjJNYiJm2uKGPdV29tjNXreGK39XyNqXPu5
 jwR4f/IN/QVSM2nCO6jyydMz8ympJ2k6M4RewwmxXBL2KsUzzJWSKTgRNqM5Tdjs
 1RhJMjkQVTiiSYerBpHXYCeZLM7/VEfZ120uuzVAYPXo0/R6zuyF7IBgIao9hbfO
 IQeCMLLfpDQHQhwquTA8ZbWqQusiEoSYHT+kTDa3eXDDbE/2UklAUs9gaatI979x
 73zs0Yqxyx2iIGaghACWOAbKdcBWBeCYDw5fFwYVKn4VMQi1+wcxbtOYL767jp9o
 vfkLXGilXcVkvDjv4fH+e1NoJXXBxq1Ug1silKdOeJzenQF8Q1i3tavkWUVCNfwH
 qyOIM72NiCEWbYBDcz0lwBxEAyO4o0E6NP1bDc4y50VedEYIbXwSh0QGrdev1abn
 rjY9vsuUR9oznmZ6BRPPxMTY87gOSHoKvqydgSZUACEgLV9346f5qZf341OReYai
 MXUmXOM4+LdyaM1+Mec8ppvjMbLw+736NZyZtT2InusEBE+Ddp25L3hYiWnklJu8
 2uwv0AoyrwaJ8y6ADOX4thcLZq0gND0Z/Ayz/XvpeI30eftsGUCt5KOVlqwfwOkI
 4EQKvk2fAixPxg==
 =rwei
 -----END PGP SIGNATURE-----

Merge tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull VDSO infrastructure updates from Thomas Gleixner:

 - Consolidate the VDSO storage

   The VDSO data storage and data layout has been largely architecture
   specific for historical reasons. That increases the maintenance
   effort and causes inconsistencies over and over.

   There is no real technical reason for architecture specific layouts
   and implementations. The architecture specific details can easily be
   integrated into a generic layout, which also reduces the amount of
   duplicated code for managing the mappings.

   Convert all architectures over to a unified layout and common mapping
   infrastructure. This splits the VDSO data layout into subsystem
   specific blocks, timekeeping, random and architecture parts, which
   provides a better structure and allows to improve and update the
   functionalities without conflict and interaction.

 - Rework the timekeeping data storage

   The current implementation is designed for exposing system
   timekeeping accessors, which was good enough at the time when it was
   designed.

   PTP and Time Sensitive Networking (TSN) change that as there are
   requirements to expose independent PTP clocks, which are not related
   to system timekeeping.

   Replace the monolithic data storage by a structured layout, which
   allows to add support for independent PTP clocks on top while reusing
   both the data structures and the time accessor implementations.

* tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  sparc/vdso: Always reject undefined references during linking
  x86/vdso: Always reject undefined references during linking
  vdso: Rework struct vdso_time_data and introduce struct vdso_clock
  vdso: Move architecture related data before basetime data
  powerpc/vdso: Prepare introduction of struct vdso_clock
  arm64/vdso: Prepare introduction of struct vdso_clock
  x86/vdso: Prepare introduction of struct vdso_clock
  time/namespace: Prepare introduction of struct vdso_clock
  vdso/namespace: Rename timens_setup_vdso_data() to reflect new vdso_clock struct
  vdso/vsyscall: Prepare introduction of struct vdso_clock
  vdso/gettimeofday: Prepare helper functions for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_coarse_timens() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_coarse() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_hres() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare introduction of struct vdso_clock
  vdso/helpers: Prepare introduction of struct vdso_clock
  vdso/datapage: Define vdso_clock to prepare for multiple PTP clocks
  vdso: Make vdso_time_data cacheline aligned
  arm64: Make asm/cache.h compatible with vDSO
  ...
2025-03-25 11:30:42 -07:00
Linus Torvalds d5048d1176 Updates for the core time/timer subsystem:
- Fix a memory ordering issue in posix-timers
 
     Posix-timer lookup is lockless and reevaluates the timer validity under
     the timer lock, but the update which validates the timer is not
     protected by the timer lock. That allows the store to be reordered
     against the initialization stores, so that the lookup side can observe
     a partially initialized timer. That's mostly a theoretical problem, but
     incorrect nevertheless.
 
   - Fix a long standing inconsistency of the coarse time getters
 
     The coarse time getters read the base time of the current update cycle
     without reading the actual hardware clock. NTP frequency adjustment can
     set the base time backwards. The fine grained interfaces compensate
     this by reading the clock and applying the new conversion factor, but
     the coarse grained time getters use the base time directly. That allows
     the user to observe time going backwards.
 
     Cure it by always forwarding base time, when NTP changes the frequency
     with an immediate step.
 
   - Rework of posix-timer hashing
 
     The posix-timer hash is not scalable and due to the CRIU timer restore
     mechanism prone to massive contention on the global hash bucket lock.
 
     Replace the global hash lock with a fine grained per bucket locking
     scheme to address that.
 
   - Rework the proc/$PID/timers interface.
 
     /proc/$PID/timers is provided for CRIU to be able to restore a
     timer. The printout happens with sighand lock held and interrupts
     disabled. That's not required as this can be done with RCU protection
     as well.
 
   - Provide a sane mechanism for CRIU to restore a timer ID
 
     CRIU restores timers by creating and deleting them until the kernel
     internal per process ID counter reached the requested ID. That's
     horribly slow for sparse timer IDs.
 
     Provide a prctl() which allows CRIU to restore a timer with a given
     ID. When enabled the ID pointer is used as input pointer to read the
     requested ID from user space. When disabled, the normal allocation
     scheme (next ID) is active as before. This is backwards compatible for
     both kernel and user space.
 
   - Make hrtimer_update_function() less expensive.
 
     The sanity checks are valuable, but expensive for high frequency usage
     in io/uring. Make the debug checks conditional and enable them only
     when lockdep is enabled.
 
   - Small updates, cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfgQ6wTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoeQzD/9p+EuUGrMbSNaLVMCYFULBbR0lersJ
 hrGGoKUsNt5T+f6hEEbSLBnkjZcMIj0J+mdIEUiRa73ryw1KmwLk/8MBu0c6u6q3
 musDvJqt3dLTG98yN0YeWK3tJDxhSjxIpwcAXusPQ04j16I2fVXFzDQ/kGPq6MTI
 tdMYzsS3wjuWpi+CbgRSP2HEwu08fIDVsQ7Grynh4Kmd31apne4ZgF2UVp6UiZyp
 8yJHZgVzJcFs7Y3MS6XTgezHnuADxMY1irzbXmok19941X8mZz2QRIpGQX+oMh6o
 g7SG2lj9i8YbLqU9/5RbC5ppjRcWfogDpW0Lk+OmdOpr0RiXTmx5Lz8Egxex9wG5
 pUJszeTY+bLw7mmYmkGZyBz+PNoGgVM5KFZRe5ENvYM8Gy8LUW5DA9zvxeHqDDz1
 FiMmKdYrwr8VCKqx+8hJQdzlzRbepxq9sNzDdMKVOUcFdGUVWekfG6ZFkfLKxwzA
 XDTKJilzXbAAj4r57vEvOCYLUZH/ZsFK4yyg0O53fEg6fj87EbTDb5+YUGazb3+C
 yNTEOQIT8LtutzLR9+xeLi92k+6zlJ4c1PfqBx5Kv/TwBrIfV1P8N2c6TCOWDoRM
 AOvo2SXEA/jEPix2GjT5jalSV1mROEXo2T9/G7kz4H7K+DkI/dGgS9mXyUDO2mMd
 ouOxYN0GohVqTQ==
 =XUGH
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer core updates from Thomas Gleixner:

 - Fix a memory ordering issue in posix-timers

   Posix-timer lookup is lockless and reevaluates the timer validity
   under the timer lock, but the update which validates the timer is not
   protected by the timer lock. That allows the store to be reordered
   against the initialization stores, so that the lookup side can
   observe a partially initialized timer. That's mostly a theoretical
   problem, but incorrect nevertheless.

 - Fix a long standing inconsistency of the coarse time getters

   The coarse time getters read the base time of the current update
   cycle without reading the actual hardware clock. NTP frequency
   adjustment can set the base time backwards. The fine grained
   interfaces compensate this by reading the clock and applying the new
   conversion factor, but the coarse grained time getters use the base
   time directly. That allows the user to observe time going backwards.

   Cure it by always forwarding base time, when NTP changes the
   frequency with an immediate step.

 - Rework of posix-timer hashing

   The posix-timer hash is not scalable and due to the CRIU timer
   restore mechanism prone to massive contention on the global hash
   bucket lock.

   Replace the global hash lock with a fine grained per bucket locking
   scheme to address that.

 - Rework the proc/$PID/timers interface.

   /proc/$PID/timers is provided for CRIU to be able to restore a timer.
   The printout happens with sighand lock held and interrupts disabled.
   That's not required as this can be done with RCU protection as well.

 - Provide a sane mechanism for CRIU to restore a timer ID

   CRIU restores timers by creating and deleting them until the kernel
   internal per process ID counter reached the requested ID. That's
   horribly slow for sparse timer IDs.

   Provide a prctl() which allows CRIU to restore a timer with a given
   ID. When enabled the ID pointer is used as input pointer to read the
   requested ID from user space. When disabled, the normal allocation
   scheme (next ID) is active as before. This is backwards compatible
   for both kernel and user space.

 - Make hrtimer_update_function() less expensive.

   The sanity checks are valuable, but expensive for high frequency
   usage in io/uring. Make the debug checks conditional and enable them
   only when lockdep is enabled.

 - Small updates, cleanups and improvements

* tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  selftests/timers: Improve skew_consistency by testing with other clockids
  timekeeping: Fix possible inconsistencies in _COARSE clockids
  posix-timers: Drop redundant memset() invocation
  selftests/timers/posix-timers: Add a test for exact allocation mode
  posix-timers: Provide a mechanism to allocate a given timer ID
  posix-timers: Dont iterate /proc/$PID/timers with sighand:: Siglock held
  posix-timers: Make per process list RCU safe
  posix-timers: Avoid false cacheline sharing
  posix-timers: Switch to jhash32()
  posix-timers: Improve hash table performance
  posix-timers: Make signal_struct:: Next_posix_timer_id an atomic_t
  posix-timers: Make lock_timer() use guard()
  posix-timers: Rework timer removal
  posix-timers: Simplify lock/unlock_timer()
  posix-timers: Use guards in a few places
  posix-timers: Remove SLAB_PANIC from kmem cache
  posix-timers: Remove a few paranoid warnings
  posix-timers: Cleanup includes
  posix-timers: Add cond_resched() to posix_timer_add() search loop
  posix-timers: Initialise timer before adding it to the hash table
  ...
2025-03-25 10:33:23 -07:00
Pauli Virtanen 983e0e4e87 net-timestamp: COMPLETION timestamp on packet tx completion
Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp
when hardware reports a packet completed.

Completion tstamp is useful for Bluetooth, as hardware timestamps do not
exist in the HCI specification except for ISO packets, and the hardware
has a queue where packets may wait.  In this case the software SND
timestamp only reflects the kernel-side part of the total latency
(usually small) and queue length (usually 0 unless HW buffers
congested), whereas the completion report time is more informative of
the true latency.

It may also be useful in other cases where HW TX timestamps cannot be
obtained and user wants to estimate an upper bound to when the TX
probably happened.

Signed-off-by: Pauli Virtanen <pav@iki.fi>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-03-25 12:48:05 -04:00
Akihiko Odaki 976c2696b7 virtio_net: Split struct virtio_net_rss_config
struct virtio_net_rss_config was less useful in actual code because of a
flexible array placed in the middle. Add new structures that split it
into two to avoid having a flexible array in the middle.

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20250321-virtio-v2-1-33afb8f4640b@daynix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 09:30:21 -07:00
Jakub Kicinski 1952e19c02 More features for 6.15, major changes:
* cfg80211/mac80211: fix and enable link reconfiguration
  * rtw88: support RTL8814AE/RTL8814AU
  * mt7996: preparations for MLO
  * ath12k: continued work on MLO
  * iwlwifi: add new iwlmld sub-driver/op-mode for
    some current and future devices
  * wfx: wowlan support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmfcE18ACgkQ10qiO8sP
 aAAW4RAAlu0ehfMYvDmUifEo2q7NXVjou1D7u3gGNlwSR/Shrftn2k47n+tThARb
 25ZutatgPlt4b+RjzeoSa724xLlDOQAG7e36tV7xZNDgiWRK6VrF+JVzFxin7cgQ
 e3yXcgf5luLiUNWocCZXc2Vr4MGVrSFcqLAFZXuQPDzVfU8ClKObUsJkEX59PsFv
 Sob5b4VzwAyBXUlUzMjSWxdAsUV04p0NBfAv0HlEuc+XboHwMZDkXJokrPz9YCpz
 TiT+hySDxeFyeXADiXQsYUAVo8LwNUNlehqEc+yzXsuZb6daCiGQiZW0wN15uxCb
 vrfYawJO05QsWtnmG6gg91gh6FwFnd0XIPjrveqzhsAqbA+AJpHqPT9lUBe3K3C2
 PObDpz5NLRnGhO62Rt0au/wkoXGLe3sro5fOy1G0v2YMEvTIvEOByxEtlh0124zM
 C9s28KRV3PqZ4+8YmTjfc/enTVzh8vFFiyj7b//xtclBbFJZezJ1iDTHj1gEuIiQ
 /Npjvu7zXMYxmtGbUfTxoZ4AxzO7wk2j6oD3b+prI3DrthRjzq9n1j7bnmlAukyw
 miFm5f0ss0TrS+dOnE8CysNz8EkYJGDjtoGh9U3Bej+q2HSYK1YwcL2BeW+v28qV
 2rqyOQbftxXHWs0n2aEzkzGwnTYKCY7kR1+CMh11GQ7+5vKBVss=
 =HSDM
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2025-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
More features for 6.15, major changes:
 * cfg80211/mac80211: fix and enable link reconfiguration
 * rtw88: support RTL8814AE/RTL8814AU
 * mt7996: preparations for MLO
 * ath12k: continued work on MLO
 * iwlwifi: add new iwlmld sub-driver/op-mode for
   some current and future devices
 * wfx: wowlan support

* tag 'wireless-next-2025-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (311 commits)
  wifi: mt76: mt7996: fix locking in mt7996_mac_sta_rc_work()
  wifi: mt76: mt76x2u: add TP-Link TL-WDN6200 ID to device table
  wifi: mt76: mt792x: re-register CHANCTX_STA_CSA only for the mt7921 series
  wifi: mt76: mt7996: Update mt7996_tx to MLO support
  wifi: mt76: mt7996: rework mt7996_ampdu_action to support MLO
  wifi: mt76: mt7996: rework set/get_tsf callabcks to support MLO
  wifi: mt76: mt7996: set vif default link_id adding/removing vif links
  wifi: mt76: mt7996: rework mt7996_mcu_beacon_inband_discov to support MLO
  wifi: mt76: mt7996: rework mt7996_mcu_add_obss_spr to support MLO
  wifi: mt76: mt7996: rework mt7996_net_fill_forward_path to support MLO
  wifi: mt76: mt7996: rework mt7996_update_mu_group to support MLO
  wifi: mt76: mt7996: rework mt7996_mac_sta_poll to support MLO
  wifi: mt76: mt7996: rework mt7996_mac_sta_rc_work to support MLO
  wifi: mt76: mt7996: remove mt7996_mac_enable_rtscts()
  wifi: mt76: mt7996: rework mt7996_sta_hw_queue_read to support MLO
  wifi: mt76: mt7996: rework mt7996_set_hw_key to support MLO
  wifi: mt76: mt7996: Add mt7996_sta_link to mt7996_mcu_add_bss_info signature
  wifi: mt76: mt7996: rework mt7996_sta_set_4addr and mt7996_sta_set_decap_offload to support MLO
  wifi: mt76: mt7996: rework mt7996_rx_get_wcid to support MLO
  wifi: mt76: mt7996: Rely on wcid_to_sta in mt7996_mac_add_txs_skb()
  ...
====================

Link: https://patch.msgid.link/20250320131106.33266-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 08:04:13 -07:00
Eric Dumazet 652e2c7778 net: reorganize IP MIB values (II)
Commit 14a1968074 ("net: reorganize IP MIB values") changed
MIB values to group hot fields together.

Since then 5 new fields have been added without caring about
data locality.

This patch moves IPSTATS_MIB_OUTPKTS, IPSTATS_MIB_NOECTPKTS,
IPSTATS_MIB_ECT1PKTS, IPSTATS_MIB_ECT0PKTS, IPSTATS_MIB_CEPKTS
to the hot portion of per-cpu data.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250320101434.3174412-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 07:37:53 -07:00
Yi Liu ad744ed5dd vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
This extends the VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT ioctls to attach/detach
a given pasid of a vfio device to/from an IOAS/HWPT.

Link: https://patch.msgid.link/r/20250321180143.8468-4-yi.l.liu@intel.com
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25 10:18:31 -03:00
Yi Liu dbc5f37b4f iommufd: Allow allocating PASID-compatible domain
The underlying infrastructure has supported the PASID attach and related
enforcement per the requirement of the IOMMU_HWPT_ALLOC_PASID flag. This
extends iommufd to support PASID compatible domain requested by userspace.

Link: https://patch.msgid.link/r/20250321171940.7213-15-yi.l.liu@intel.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25 10:18:31 -03:00
Jason Xing 9552f90835 tcp: support TCP_DELACK_MAX_US for set/getsockopt use
Support adjusting/reading delayed ack max for socket level by using
set/getsockopt().

This option aligns with TCP_BPF_DELACK_MAX usage. Considering that bpf
option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.

Add WRITE_ONCE/READ_ONCE() to prevent data-race if setsockopt()
happens to write one value to icsk_delack_max while icsk_delack_max is
being read.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 04:27:20 -07:00
Jason Xing f38805c5d2 tcp: support TCP_RTO_MIN_US for set/getsockopt use
Support adjusting/reading RTO MIN for socket level by using set/getsockopt().

This new option has the same effect as TCP_BPF_RTO_MIN, which means it
doesn't affect RTAX_RTO_MIN usage (by using ip route...). Considering that
bpf option was implemented before this patch, so we need to use a standalone
new option for pure tcp set/getsockopt() use.

When the socket is created, its icsk_rto_min is set to the default
value that is controlled by sysctl_tcp_rto_min_us. Then if application
calls setsockopt() with TCP_RTO_MIN_US flag to pass a valid value, then
icsk_rto_min will be overridden in jiffies unit.

This patch adds WRITE_ONCE/READ_ONCE to avoid data-race around
icsk_rto_min.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250317120314.41404-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 04:27:19 -07:00
Linus Torvalds 327ecdbc0f Performance events updates for v6.15:
Core:
 
   - Move perf_event sysctls into kernel/events/ (Joel Granados)
   - Use POLLHUP for pinned events in error (Namhyung Kim)
   - Avoid the read if the count is already updated (Peter Zijlstra)
   - Allow the EPOLLRDNORM flag for poll (Tao Chen)
 
   - locking/percpu-rwsem: Add guard support (Peter Zijlstra)
     [ NOTE: this got (mis-)merged into the perf tree due to related work. ]
 
 perf_pmu_unregister() related improvements: (Peter Zijlstra)
 
   - Simplify the perf_event_alloc() error path
   - Simplify the perf_pmu_register() error path
   - Simplify perf_pmu_register()
   - Simplify perf_init_event()
   - Simplify perf_event_alloc()
   - Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count
   - Add this_cpc() helper
   - Introduce perf_free_addr_filters()
   - Robustify perf_event_free_bpf_prog()
   - Simplify the perf_mmap() control flow
   - Further simplify perf_mmap()
   - Remove retry loop from perf_mmap()
   - Lift event->mmap_mutex in perf_mmap()
   - Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
   - Fix perf_mmap() failure path
 
 Uprobes:
 
   - Harden x86 uretprobe syscall trampoline check (Jiri Olsa)
   - Remove redundant spinlock in uprobe_deny_signal() (Liao Chang)
   - Remove the spinlock within handle_singlestep() (Liao Chang)
 
 x86 Intel PMU enhancements:
 
   - Support PEBS counters snapshotting (Kan Liang)
   - Fix intel_pmu_read_event() (Kan Liang)
   - Extend per event callchain limit to branch stack (Kan Liang)
   - Fix system-wide LBR profiling (Kan Liang)
   - Allocate bts_ctx only if necessary (Li RongQing)
   - Apply static call for drain_pebs (Peter Zijlstra)
 
 x86 AMD PMU enhancements: (Ravi Bangoria)
 
   - Remove pointless sample period check
   - Fix ->config to sample period calculation for OP PMU
   - Fix perf_ibs_op.cnt_mask for CurCnt
   - Don't allow freq mode event creation through ->config interface
   - Add PMU specific minimum period
   - Add ->check_period() callback
   - Ceil sample_period to min_period
   - Add support for OP Load Latency Filtering
   - Update DTLB/PageSize decode logic
 
 Hardware breakpoints:
 
   - Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar Bhaskar)
 
 Hardlockup detector improvements: (Li Huafei)
 
   - perf_event memory leak
   - Warn if watchdog_ev is leaked
 
 Fixes and cleanups:
 
   - Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter Zijlstra,
     Ravi Bangoria, Thorsten Blum, XieLudan)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfehRIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hF3g//TCAQijI6OFNpYiD1xoyMq4m+baIYhYx0
 lnxwxhsN58JFcEJeWIEGLACqUePyH68jNKVSr9sIoeV4gnnMX+x2Ny6rh/1H3Ox+
 jQyVmPdFmKa8QG7wGNjcDteIzlEKK4zqruXWaG54LX2e6kbQZWwd0I21MyXkrHXb
 oMIfyZbCAWuPW1wefZm8FPgImT+nvwOosyx90OVagGqk5mYdNb9DFhMjQveStHdQ
 BnWU6rYdW1c2eXKpeuvxY4uWQoCELC6WntLimvcswy6fb+9LtbglpCYQOGGDrGvp
 v3RASf/8clFVSau8P/8NEaNgLgjN/e3eN/fAoSut8Z22nAeBC6qv4qjFt1piDpbs
 AaEXYCYM0/Tfzjp3ctPsFrxbKvB8q2qhxSm37Co0Ix6WyJn3JQbNx48g8GIod2os
 eGPXSZzoz9O8coeTKKbxWp4fpAjFfyfe/ovWQuVd8JI4bYj7Mi63J+RxQDd2TkJP
 H+IgxZoamJExgS1YcKJUBtw7QKQm5pHFx03Br7KsNxgmHy7JdoN9bh0h14pkeXjB
 MnAvWOS5ouuriJgQ+4bqAezS8DSHnDdmFmWgNEEqAlOD9Zy9hDXJ2GiqbHKMyRNC
 ae35o0PDUFTIX9O5NPIDUyWtJb5uH/S1lQhS7GD+ODlMDIX+ny+REXf9krSCR1H0
 GUqq2UmxBGA=
 =iPmA
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:
 "Core:
   - Move perf_event sysctls into kernel/events/ (Joel Granados)
   - Use POLLHUP for pinned events in error (Namhyung Kim)
   - Avoid the read if the count is already updated (Peter Zijlstra)
   - Allow the EPOLLRDNORM flag for poll (Tao Chen)
   - locking/percpu-rwsem: Add guard support [ NOTE: this got
     (mis-)merged into the perf tree due to related work ] (Peter
     Zijlstra)

  perf_pmu_unregister() related improvements: (Peter Zijlstra)
   - Simplify the perf_event_alloc() error path
   - Simplify the perf_pmu_register() error path
   - Simplify perf_pmu_register()
   - Simplify perf_init_event()
   - Simplify perf_event_alloc()
   - Merge struct pmu::pmu_disable_count into struct
     perf_cpu_pmu_context::pmu_disable_count
   - Add this_cpc() helper
   - Introduce perf_free_addr_filters()
   - Robustify perf_event_free_bpf_prog()
   - Simplify the perf_mmap() control flow
   - Further simplify perf_mmap()
   - Remove retry loop from perf_mmap()
   - Lift event->mmap_mutex in perf_mmap()
   - Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
   - Fix perf_mmap() failure path

  Uprobes:
   - Harden x86 uretprobe syscall trampoline check (Jiri Olsa)
   - Remove redundant spinlock in uprobe_deny_signal() (Liao Chang)
   - Remove the spinlock within handle_singlestep() (Liao Chang)

  x86 Intel PMU enhancements:
   - Support PEBS counters snapshotting (Kan Liang)
   - Fix intel_pmu_read_event() (Kan Liang)
   - Extend per event callchain limit to branch stack (Kan Liang)
   - Fix system-wide LBR profiling (Kan Liang)
   - Allocate bts_ctx only if necessary (Li RongQing)
   - Apply static call for drain_pebs (Peter Zijlstra)

  x86 AMD PMU enhancements: (Ravi Bangoria)
   - Remove pointless sample period check
   - Fix ->config to sample period calculation for OP PMU
   - Fix perf_ibs_op.cnt_mask for CurCnt
   - Don't allow freq mode event creation through ->config interface
   - Add PMU specific minimum period
   - Add ->check_period() callback
   - Ceil sample_period to min_period
   - Add support for OP Load Latency Filtering
   - Update DTLB/PageSize decode logic

  Hardware breakpoints:
   - Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar
     Bhaskar)

  Hardlockup detector improvements: (Li Huafei)
   - perf_event memory leak
   - Warn if watchdog_ev is leaked

  Fixes and cleanups:
   - Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter
     Zijlstra, Ravi Bangoria, Thorsten Blum, XieLudan)"

* tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  perf: Fix __percpu annotation
  perf: Clean up pmu specific data
  perf/x86: Remove swap_task_ctx()
  perf/x86/lbr: Fix shorter LBRs call stacks for the system-wide mode
  perf: Supply task information to sched_task()
  perf: attach/detach PMU specific data
  locking/percpu-rwsem: Add guard support
  perf: Save PMU specific data in task_struct
  perf: Extend per event callchain limit to branch stack
  perf/ring_buffer: Allow the EPOLLRDNORM flag for poll
  perf/core: Use POLLHUP for pinned events in error
  perf/core: Use sysfs_emit() instead of scnprintf()
  perf/core: Remove optional 'size' arguments from strscpy() calls
  perf/x86/intel/bts: Check if bts_ctx is allocated when calling BTS functions
  uprobes/x86: Harden uretprobe syscall trampoline check
  watchdog/hardlockup/perf: Warn if watchdog_ev is leaked
  watchdog/hardlockup/perf: Fix perf_event memory leak
  perf/x86: Annotate struct bts_buffer::buf with __counted_by()
  perf/core: Clean up perf_try_init_event()
  perf/core: Fix perf_mmap() failure path
  ...
2025-03-24 21:46:36 -07:00
Linus Torvalds 2f2d529458 bitmap changes for 6.15
This includes:
  - cpumask_next_wrap() rework from me;
  - GENMASK() simplification from I Hsin;
  - rust bindings for cpumasks from Viresh and me;
  - scattered cleanups from Andy, Tamir, Vincent, Ignacio and Joel.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmfhicUACgkQsUSA/Tof
 vsiT1Av/TFpTFPcfb0/U6zTjhphqSkhCqBN4JcT+Qh1pyFN3Q8xh7FIRjqm6PoWb
 wypQTrsOuS1UImfxj2PkHPiagDHz3LBWRJ1WCBZPF3FgZaFdOtVDObn91APaX4Jz
 K7B2eghnDLk74+eV3aBLVCPgdFPm4Og+3W2J9loWDHYNBrlgQX/3T8gZzJcIzDxk
 8jDiy84cGQweW3K6VDr7WGb/gDBTNXKByFig4+rzuW8X/VcUB1wZi1lHqTL3yBMm
 hXGsa8/VFLVKpRhZxx7PeTiXF+Wp4Tu7iyCuLVK9F9P9pY4GBZ9KV69yaeHLwlwF
 P4eA3Lj1KvtwmZYDT19lB8V0El7nZzcTHtmSgII8JEniWvuVQjjARicIqFqh6zmX
 QaLOt/gfGT/tr9nPzsFMgQxHV0ocibqWmM0gZyfEDsqIX0ynSh1fbMf52PrbBBSX
 aOaVV55HWIjHzLPzqvVee8JMaCwn4hNDrVaWItedQzZkf8aXKLk/GUWYaaEwQ8yY
 N7D3sXbT
 =Bm5k
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-for-6.15' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - cpumask_next_wrap() rework (me)

 - GENMASK() simplification (I Hsin)

 - rust bindings for cpumasks (Viresh and me)

 - scattered cleanups (Andy, Tamir, Vincent, Ignacio and Joel)

* tag 'bitmap-for-6.15' of https://github.com/norov/linux: (22 commits)
  cpumask: align text in comment
  riscv: fix test_and_{set,clear}_bit ordering documentation
  treewide: fix typo 'unsigned __init128' -> 'unsigned __int128'
  MAINTAINERS: add rust bindings entry for bitmap API
  rust: Add cpumask helpers
  uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"
  cpumask: drop cpumask_next_wrap_old()
  PCI: hv: Switch hv_compose_multi_msi_req_get_cpu() to using cpumask_next_wrap()
  scsi: lpfc: rework lpfc_next_{online,present}_cpu()
  scsi: lpfc: switch lpfc_irq_rebalance() to using cpumask_next_wrap()
  s390: switch stop_machine_yield() to using cpumask_next_wrap()
  padata: switch padata_find_next() to using cpumask_next_wrap()
  cpumask: use cpumask_next_wrap() where appropriate
  cpumask: re-introduce cpumask_next{,_and}_wrap()
  cpumask: deprecate cpumask_next_wrap()
  powerpc/xmon: simplify xmon_batch_next_cpu()
  ibmvnic: simplify ibmvnic_set_queue_affinity()
  virtio_net: simplify virtnet_set_affinity()
  objpool: rework objpool_pop()
  cpumask: add for_each_{possible,online}_cpu_wrap
  ...
2025-03-24 19:11:58 -07:00
Linus Torvalds f81c2b8150 It has been a reasonably busy cycle for docs...
- Significant changes throughout the tree to bring Python code up to
   current standards and raise the minimum Python required to 3.9.  Much of
   this is preparatory to replacing the ancient Perl scripts/kernel-doc
   horror with a slightly less horrifying Python implementation, expected
   for 6.16.
 
 - Update the minimum Sphinx required to 3.4.3, allowing us to remove a
   bunch of older compatibility code.
 
 - Rework and improve the generation of the ABI documentation.
 
   (All of the above done by Mauro)
 
 - Lots of translation updates.  Alex Shi and Yanteng Si are taking on
   responsibility for the Chinese translations going forward; that work will
   still get to you via docs-next
 
 - Try to standardize the format for indicating a developer's affiliation in
   commit tags.
 
 - Clarify the TAB's role in CoC enforcement actions.
 
 - Try to spell out the rules for when a commit tag can name another
   developer without their explicit permission.
 
 Plus lots of other typo fixes and updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmfccxwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoYcH/jL/nS8YAiJ3awF5PH5tR3m5ddt9l+fKXWJx
 PB3KcHtDORbWltTA+Tvo2aP1jxGY9wqsIIvl+nvjJyUcfd72g4HNfTDUDXwP3OFU
 wTkaEAQp3n/hqnLXtJ2AzV3Ir5cIfEL2d7F6QsN1Gnof8iu2OuMk5iMeb0iexUX6
 FYjJq+jknh30VdAp2hxHy8q17R7h7PySh5OsjeAYJJroLv60n3DwQgnzHjXC/FT2
 Qq1UuEzlSpRoso2o2NwVTND6OVW081umo6YrioqD7ZC2G2fhRgLFJJtJGXDNcyUl
 gQv9xLSaTD97V4zaWPm28ObNBpY/GnAd4hMjB17wAH5xUfVS5Aw=
 =Gvdp
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.15' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a reasonably busy cycle for docs...

   - Significant changes throughout the tree to bring Python code up to
     current standards and raise the minimum Python required to 3.9

     Much of this is preparatory to replacing the ancient Perl
     scripts/kernel-doc horror with a slightly less horrifying Python
     implementation, expected for 6.16

   - Update the minimum Sphinx required to 3.4.3, allowing us to remove
     a bunch of older compatibility code

   - Rework and improve the generation of the ABI documentation

  (All of the above done by Mauro)

   - Lots of translation updates. Alex Shi and Yanteng Si are taking on
     responsibility for the Chinese translations going forward; that
     work will still get to you via docs-next

   - Try to standardize the format for indicating a developer's
     affiliation in commit tags

   - Clarify the TAB's role in CoC enforcement actions

   - Try to spell out the rules for when a commit tag can name another
     developer without their explicit permission

  Plus lots of other typo fixes and updates"

* tag 'docs-6.15' of git://git.lwn.net/linux: (98 commits)
  docs/zh_CN: fix spelling mistake
  docs/Chinese: change the disclaimer words
  docs/zh_CN: Add snp-tdx-threat-model index Chinese translation
  docs: driver-api: firmware: clarify userspace requirements
  docs: clarify rules wrt tagging other people
  docs: Remove outdated highuid.rst documentation
  Documentation: dma-buf: heaps: Add heap name definitions
  docs/.../submit-checklist: Use Documentation/admin-guide/abi.rst for cross-ref of README
  docs: Correct installation instruction
  Documentation: kcsan: fix "Plain Accesses and Data Races" URL in kcsan.rst
  Documentation/CoC: Spell out the TAB role in enforcement decisions
  Documentation: ocxl.rst: Update consortium site
  scripts: get_feat.pl: substitute s390x with s390
  scripts/kernel-doc: drop dead code for Wcontents_before_sections
  scripts/kernel-doc: don't add not needed new lines
  docs: driver-api/infiniband.rst: fix Kerneldoc markup
  drivers: firewire: firewire-cdev.h: fix identation on a kernel-doc markup
  drivers: media: intel-ipu3.h: fix identation on a kernel-doc markup
  include/asm-generic/io.h: fix kerneldoc markup
  Docs/arch/arm64: Fix spelling in amu.rst
  ...
2025-03-24 18:42:27 -07:00
Linus Torvalds fc13a78e1f hardening updates for v6.15-rc1
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan Vadivel)
 
 - samples/check-exec: Fix script name (Mickaël Salaün)
 
 - yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
 
 - lib/string_choices: Sort by function name (R Sundar)
 
 - hardening: Allow default HARDENED_USERCOPY to be set at compile time
   (Mel Gorman)
 
 - uaccess: Split out compile-time checks into ucopysize.h
 
 - kbuild: clang: Support building UM with SUBARCH=i386
 
 - x86: Enable i386 FORTIFY_SOURCE on Clang 16+
 
 - ubsan/overflow: Rework integer overflow sanitizer option
 
 - Add missing __nonstring annotations for callers of memtostr*()/strtomem*()
 
 - Add __must_be_noncstr() and have memtostr*()/strtomem*() check for it
 
 - Introduce __nonstring_array for silencing future GCC 15 warnings
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hGrgAKCRA2KwveOeQk
 u1WvAQC3ZxFu3b0Omfmht2pPqCltf2UOQNvUx3egjoeXpUaNSgD+Lxr/T4xksy7E
 jHh7rCYDkruOWs3DHA5JjRQcf0BBLQo=
 =FTQp
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "As usual, it's scattered changes all over. Patches touching things
  outside of our traditional areas in the tree have been Acked by
  maintainers or were trivial changes:

   - loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan
     Vadivel)

   - samples/check-exec: Fix script name (Mickaël Salaün)

   - yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)

   - lib/string_choices: Sort by function name (R Sundar)

   - hardening: Allow default HARDENED_USERCOPY to be set at compile
     time (Mel Gorman)

   - uaccess: Split out compile-time checks into ucopysize.h

   - kbuild: clang: Support building UM with SUBARCH=i386

   - x86: Enable i386 FORTIFY_SOURCE on Clang 16+

   - ubsan/overflow: Rework integer overflow sanitizer option

   - Add missing __nonstring annotations for callers of
     memtostr*()/strtomem*()

   - Add __must_be_noncstr() and have memtostr*()/strtomem*() check for
     it

   - Introduce __nonstring_array for silencing future GCC 15 warnings"

* tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
  compiler_types: Introduce __nonstring_array
  hardening: Enable i386 FORTIFY_SOURCE on Clang 16+
  x86/build: Remove -ffreestanding on i386 with GCC
  ubsan/overflow: Enable ignorelist parsing and add type filter
  ubsan/overflow: Enable pattern exclusions
  ubsan/overflow: Rework integer overflow sanitizer option to turn on everything
  samples/check-exec: Fix script name
  yama: don't abuse rcu_read_lock/get_task_struct in yama_task_prctl()
  kbuild: clang: Support building UM with SUBARCH=i386
  loadpin: remove MODULE_COMPRESS_NONE as it is no longer supported
  lib/string_choices: Rearrange functions in sorted order
  string.h: Validate memtostr*()/strtomem*() arguments more carefully
  compiler.h: Introduce __must_be_noncstr()
  nilfs2: Mark on-disk strings as nonstring
  uapi: stddef.h: Introduce __kernel_nonstring
  x86/tdx: Mark message.bytes as nonstring
  string: kunit: Mark nonstring test strings as __nonstring
  scsi: qla2xxx: Mark device strings as nonstring
  scsi: mpt3sas: Mark device strings as nonstring
  scsi: mpi3mr: Mark device strings as nonstring
  ...
2025-03-24 15:18:08 -07:00
Linus Torvalds 4f773fcbdb execve updates for v6.15-rc1
- elf: Define and use note name macros (Akihiko Odaki)
 
 - elf: add remaining SHF_ flag macros (Timur Tabi)
 
 - binfmt: Remove loader from linux_binprm struct (Yonatan Goldschmidt)
 
 - binfmt_elf_fdpic: fix variable set but not used warning (sunliming)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hJxQAKCRA2KwveOeQk
 uwhAAP9WXP77FCfAvJkmOfDI5FSa6g2WH/BnUOtZW63lSoXnTAEAuttg8W1IKcl/
 Db+R/RLS3QqrU+ib5xWBeA6ZvxZXCwA=
 =Svn4
 -----END PGP SIGNATURE-----

Merge tag 'execve-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve updates from Kees Cook:

 - elf: Define and use note name macros (Akihiko Odaki)

 - elf: add remaining SHF_ flag macros (Timur Tabi)

 - binfmt: Remove loader from linux_binprm struct (Yonatan Goldschmidt)

 - binfmt_elf_fdpic: fix variable set but not used warning (sunliming)

* tag 'execve-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt_elf_fdpic: fix variable set but not used warning
  elf: add remaining SHF_ flag macros
  binfmt: Remove loader from linux_binprm struct
  crash: Remove KEXEC_CORE_NOTE_NAME
  s390/crash: Use note name macros
  crash: Use note name macros
  powerpc/crash: Use note name macros
  binfmt_elf: Use note name macros
  elf: Define note name macros
2025-03-24 14:58:04 -07:00
Linus Torvalds df00ded23a vfs-6.15-rc1.pidfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90pqgAKCRCRxhvAZXjc
 oqVsAP9Aq/fMCI14HeXehPCezKQZPu1HTrPPo2clLHXoSnafawEAsA3YfWTT4Heb
 iexzqvAEUOMYOVN66QEc+6AAwtMLrwc=
 =0eYo
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs pidfs updates from Christian Brauner:

 - Allow retrieving exit information after a process has been reaped
   through pidfds via the new PIDFD_INTO_EXIT extension for the
   PIDFD_GET_INFO ioctl. Various tools need access to information about
   a process/task even after it has already been reaped.

   Pidfd polling allows waiting on either task exit or for a task to
   have been reaped. The contract for PIDFD_INFO_EXIT is simply that
   EPOLLHUP must be observed before exit information can be retrieved,
   i.e., exit information is only provided once the task has been reaped
   and then can be retrieved as long as the pidfd is open.

 - Add PIDFD_SELF_{THREAD,THREAD_GROUP} sentinels allowing userspace to
   forgo allocating a file descriptor for their own process. This is
   useful in scenarios where users want to act on their own process
   through pidfds and is akin to AT_FDCWD.

 - Improve premature thread-group leader and subthread exec behavior
   when polling on pidfds:

   (1) During a multi-threaded exec by a subthread, i.e.,
       non-thread-group leader thread, all other threads in the
       thread-group including the thread-group leader are killed and the
       struct pid of the thread-group leader will be taken over by the
       subthread that called exec. IOW, two tasks change their TIDs.

   (2) A premature thread-group leader exit means that the thread-group
       leader exited before all of the other subthreads in the
       thread-group have exited.

   Both cases lead to inconsistencies for pidfd polling with
   PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the
   current thread-group leader may or may not see an exit notification
   on the file descriptor depending on when poll is performed. If the
   poll is performed before the exec of the subthread has concluded an
   exit notification is generated for the old thread-group leader. If
   the poll is performed after the exec of the subthread has concluded
   no exit notification is generated for the old thread-group leader.

   The correct behavior is to simply not generate an exit notification
   on the struct pid of a subhthread exec because the struct pid is
   taken over by the subthread and thus remains alive.

   But this is difficult to handle because a thread-group may exit
   premature as mentioned in (2). In that case an exit notification is
   reliably generated but the subthreads may continue to run for an
   indeterminate amount of time and thus also may exec at some point.

   After this pull no exit notifications will be generated for a
   PIDFD_THREAD pidfd for a thread-group leader until all subthreads
   have been reaped. If a subthread should exec before no exit
   notification will be generated until that task exits or it creates
   subthreads and repeates the cycle.

   This means an exit notification indicates the ability for the father
   to reap the child.

* tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  selftests/pidfd: third test for multi-threaded exec polling
  selftests/pidfd: second test for multi-threaded exec polling
  selftests/pidfd: first test for multi-threaded exec polling
  pidfs: improve multi-threaded exec and premature thread-group leader exit polling
  pidfs: ensure that PIDFS_INFO_EXIT is available
  selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest
  selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add third PIDFD_INFO_EXIT selftest
  selftests/pidfd: add second PIDFD_INFO_EXIT selftest
  selftests/pidfd: add first PIDFD_INFO_EXIT selftest
  selftests/pidfd: expand common pidfd header
  pidfs/selftests: ensure correct headers for ioctl handling
  selftests/pidfd: fix header inclusion
  pidfs: allow to retrieve exit information
  pidfs: record exit code and cgroupid at exit
  pidfs: use private inode slab cache
  pidfs: move setting flags into pidfs_alloc_file()
  pidfd: rely on automatic cleanup in __pidfd_prepare()
  ...
2025-03-24 10:16:37 -07:00
Linus Torvalds fd101da676 vfs-6.15-rc1.mount
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90qAwAKCRCRxhvAZXjc
 on7lAP0akpIsJMWREg9tLwTNTySI1b82uKec0EAgM6T7n/PYhAD/T4zoY8UYU0Pr
 qCxwTXHUVT6bkNhjREBkfqq9OkPP8w8=
 =GxeN
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs mount updates from Christian Brauner:

 - Mount notifications

   The day has come where we finally provide a new api to listen for
   mount topology changes outside of /proc/<pid>/mountinfo. A mount
   namespace file descriptor can be supplied and registered with
   fanotify to listen for mount topology changes.

   Currently notifications for mount, umount and moving mounts are
   generated. The generated notification record contains the unique
   mount id of the mount.

   The listmount() and statmount() api can be used to query detailed
   information about the mount using the received unique mount id.

   This allows userspace to figure out exactly how the mount topology
   changed without having to generating diffs of /proc/<pid>/mountinfo
   in userspace.

 - Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount
   api

 - Support detached mounts in overlayfs

   Since last cycle we support specifying overlayfs layers via file
   descriptors. However, we don't allow detached mounts which means
   userspace cannot user file descriptors received via
   open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to
   attach them to a mount namespace via move_mount() first.

   This is cumbersome and means they have to undo mounts via umount().
   Allow them to directly use detached mounts.

 - Allow to retrieve idmappings with statmount

   Currently it isn't possible to figure out what idmapping has been
   attached to an idmapped mount. Add an extension to statmount() which
   allows to read the idmapping from the mount.

 - Allow creating idmapped mounts from mounts that are already idmapped

   So far it isn't possible to allow the creation of idmapped mounts
   from already idmapped mounts as this has significant lifetime
   implications. Make the creation of idmapped mounts atomic by allow to
   pass struct mount_attr together with the open_tree_attr() system call
   allowing to solve these issues without complicating VFS lookup in any
   way.

   The system call has in general the benefit that creating a detached
   mount and applying mount attributes to it becomes an atomic operation
   for userspace.

 - Add a way to query statmount() for supported options

   Allow userspace to query which mount information can be retrieved
   through statmount().

 - Allow superblock owners to force unmount

* tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
  umount: Allow superblock owners to force umount
  selftests: add tests for mount notification
  selinux: add FILE__WATCH_MOUNTNS
  samples/vfs: fix printf format string for size_t
  fs: allow changing idmappings
  fs: add kflags member to struct mount_kattr
  fs: add open_tree_attr()
  fs: add copy_mount_setattr() helper
  fs: add vfs_open_tree() helper
  statmount: add a new supported_mask field
  samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
  selftests: add tests for using detached mount with overlayfs
  samples/vfs: check whether flag was raised
  statmount: allow to retrieve idmappings
  uidgid: add map_id_range_up()
  fs: allow detached mounts in clone_private_mount()
  selftests/overlayfs: test specifying layers as O_PATH file descriptors
  fs: support O_PATH fds with FSCONFIG_SET_FD
  vfs: add notifications for mount attach and detach
  fanotify: notify on mount attach and detach
  ...
2025-03-24 09:34:10 -07:00
Nuno Das Neves 621191d709 Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs
Provide a set of IOCTLs for creating and managing child partitions when
running as root partition on Hyper-V. The new driver is enabled via
CONFIG_MSHV_ROOT.

A brief overview of the interface:

MSHV_CREATE_PARTITION is the entry point, returning a file descriptor
representing a child partition. IOCTLs on this fd can be used to map
memory, create VPs, etc.

Creating a VP returns another file descriptor representing that VP which
in turn has another set of corresponding IOCTLs for running the VP,
getting/setting state, etc.

MSHV_ROOT_HVCALL is a generic "passthrough" hypercall IOCTL which can be
used for a number of partition or VP hypercalls. This is for hypercalls
that do not affect any state in the kernel driver, such as getting and
setting VP registers and partition properties, translating addresses,
etc. It is "passthrough" because the binary input and output for the
hypercall is only interpreted by the VMM - the kernel driver does
nothing but insert the VP and partition id where necessary (which are
always in the same place), and execute the hypercall.

Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Co-developed-by: Jinank Jain <jinankjain@microsoft.com>
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Co-developed-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Co-developed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-developed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Co-developed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Link: https://lore.kernel.org/r/1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com>
2025-03-21 18:24:22 +00:00
Arnd Bergmann 3fed9fda15 net: remove sb1000 cable modem driver
This one is hilariously outdated, it provided a faster downlink over
TV cable for users of analog modems in the 1990s, through an ISA card.

The web page for the userspace tools has been broken for 25 years, and
the driver has only ever seen mechanical updates.

Link: http://web.archive.org/web/20000611165545/http://home.adelphia.net:80/~siglercm/sb1000.html
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312085236.2531870-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21 17:11:54 +01:00
Mickaël Salaün 15383a0d63
landlock: Add the errata interface
Some fixes may require user space to check if they are applied on the
running kernel before using a specific feature.  For instance, this
applies when a restriction was previously too restrictive and is now
getting relaxed (e.g. for compatibility reasons).  However, non-visible
changes for legitimate use (e.g. security fixes) do not require an
erratum.

Because fixes are backported down to a specific Landlock ABI, we need a
way to avoid cherry-pick conflicts.  The solution is to only update a
file related to the lower ABI impacted by this issue.  All the ABI files
are then used to create a bitmask of fixes.

The new errata interface is similar to the one used to get the supported
Landlock ABI version, but it returns a bitmask instead because the order
of fixes may not match the order of versions, and not all fixes may
apply to all versions.

The actual errata will come with dedicated commits.  The description is
not actually used in the code but serves as documentation.

Create the landlock_abi_version symbol and use its value to check errata
consistency.

Update test_base's create_ruleset_checks_ordering tests and add errata
tests.

This commit is backportable down to the first version of Landlock.

Fixes: 3532b0b435 ("landlock: Enable user space to infer supported features")
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-21 12:12:19 +01:00
Jens Axboe 07754bfd9a io_uring: enable toggle of iowait usage when waiting on CQEs
By default, io_uring marks a waiting task as being in iowait, if it's
sleeping waiting on events and there are pending requests. This isn't
necessarily always useful, and may be confusing on non-storage setups
where iowait isn't expected. It can also cause extra power usage, by
preventing the CPU from entering lower sleep states.

This adds a new enter flag, IORING_ENTER_NO_IOWAIT. If set, then
io_uring will not account the sleeping task as being in iowait. If the
kernel supports this feature, then it will be marked by having the
IORING_FEAT_NO_IOWAIT feature flag set.

As the kernel currently does not support separating the iowait
accounting and CPU frequency boosting, the IORING_ENTER_NO_IOWAIT
controls both of these at the same time. In the future, if those do end
up being split, then it'd be possible to control them separately.
However, it seems more likely that the kernel will decouple iowait and
CPU frequency boosting anyway.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-20 20:01:03 -06:00
Oliver Upton 4f2774c57a Merge branch 'kvm-arm64/writable-midr' into kvmarm/next
* kvm-arm64/writable-midr:
  : Writable implementation ID registers, courtesy of Sebastian Ott
  :
  : Introduce a new capability that allows userspace to set the
  : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1,
  : and AIDR_EL1. Also plug a hole in KVM's trap configuration where
  : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not
  : support SME.
  KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
  KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable
  KVM: arm64: Copy guest CTR_EL0 into hyp VM
  KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR
  KVM: arm64: Allow userspace to change the implementation ID registers
  KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value
  KVM: arm64: Maintain per-VM copy of implementation ID regs
  KVM: arm64: Set HCR_EL2.TID1 unconditionally

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19 14:54:32 -07:00
Christian Brauner 68db272741
pidfs: ensure that PIDFS_INFO_EXIT is available
When we currently create a pidfd we check that the task hasn't been
reaped right before we create the pidfd. But it is of course possible
that by the time we return the pidfd to userspace the task has already
been reaped since we don't check again after having created a dentry for
it.

This was fine until now because that race was meaningless. But now that
we provide PIDFD_INFO_EXIT it is a problem because it is possible that
the kernel returns a reaped pidfd and it depends on the race whether
PIDFD_INFO_EXIT information is available. This depends on if the task
gets reaped before or after a dentry has been attached to struct pid.

Make this consistent and only returned pidfds for reaped tasks if
PIDFD_INFO_EXIT information is available. This is done by performing
another check whether the task has been reaped right after we attached a
dentry to struct pid.

Since pidfs_exit() is called before struct pid's task linkage is removed
the case where the task got reaped but a dentry was already attached to
struct pid and exit information was recorded and published can be
handled correctly. In that case we do return a pidfd for a reaped task
like we would've before.

Link: https://lore.kernel.org/r/20250316-kabel-fehden-66bdb6a83436@brauner
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-19 14:40:18 +01:00
Daniel Vacek fc5c0c5825 btrfs: defrag: extend ioctl to accept compression levels
The zstd and zlib compression types support setting compression levels.
Enhance the defrag interface to specify the levels as well. For zstd the
negative (realtime) levels are also accepted.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-18 20:35:50 +01:00
Nicolin Chen e7d3fa3d29 iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU
Aside from the IOPF framework, iommufd provides an additional pathway to
report hardware events, via the vEVENTQ of vIOMMU infrastructure.

Define an iommu_vevent_arm_smmuv3 uAPI structure, and report stage-1 events
in the threaded IRQ handler. Also, add another four event record types that
can be forwarded to a VM.

Link: https://patch.msgid.link/r/5cf6719682fdfdabffdb08374cdf31ad2466d75a.1741719725.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-18 14:17:48 -03:00
Nicolin Chen e36ba5ab80 iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC
Introduce a new IOMMUFD_OBJ_VEVENTQ object for vIOMMU Event Queue that
provides user space (VMM) another FD to read the vIOMMU Events.

Allow a vIOMMU object to allocate vEVENTQs, with a condition that each
vIOMMU can only have one single vEVENTQ per type.

Add iommufd_veventq_alloc() with iommufd_veventq_ops for the new ioctl.

Link: https://patch.msgid.link/r/21acf0751dd5c93846935ee06f93b9c65eff5e04.1741719725.git.nicolinc@nvidia.com
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-18 14:17:47 -03:00
Paolo Abeni ed6bcbe39e This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - drop batadv_priv_debug_log struct, by Sven Eckelmann
 
  - adopt netdev_hold() / netdev_put(), by Eric Dumazet
 
  - add support for jumbo frames, by Sven Eckelmann
 
  - use consistent name for mesh interface, by Sven Eckelmann
 
  - cleanup B.A.T.M.A.N. IV OGM aggregation handling,
    by Sven Eckelmann (4 patches)
 
  - add missing newlines for log macros, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmfTCE0WHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoea+EACQ3fJp7kf6iWxhJLiwon3Aoo4u
 dg84xX5nc02KHT/zUC/ADSSN3FDdc3y8jgjpLfZv5TkWmRUqMNP9449PaoXpiIsW
 oI6HKrVaEKIEPpMAaBrv0dPtG5fBZPku4tZBLP7ekU/2N0Ri3eKTCph7IznO3Ma5
 cM4FN6H9Nxwnht2qN7Gx3nWoBn9LeaSyKfAFLR+hI3QxTh9PYLV26rGCLrUhbbvT
 St8TtgdeULSH/50rhMqC4wpT8verk+BMJdShNPLyse4QSRky6DD6YOj7zS/c7akP
 z8S77gvp0mcgzb/t49K4AFjUFEvK9l0rvPDx6DMrfJwQAZUPjFY93y1XI5QUOPRZ
 +kVj0nr2XRc7c1/wDNfn4131T9Twi41+TX6BI8Fan8WpWKVTo2a/O5VWk5pHKjy+
 a8aZqL41444wWyFhyCL+Io1y1mDc9TDa7644+fJjR5H4h47M9rGnLVEX3F9mVtBl
 LIOwpHpHXg9z3JDgpCwLWykUzW4lEuF1hQFUD3DPIbVKENjvibiiB+qPNZJG1icL
 n9Pt6843vF8gcw3aGbp0VGsESILiKr7EN7CMkhfcO1CaqAErJG4DinubH6DLQIjO
 uW5Ss2751BfqWPOakWVzvvWafnleVRdw4rysbil4ko/xONg0qdfTMWe5fS42bkbk
 CMmGPNeSiqVTXp5bIg==
 =hmiI
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - drop batadv_priv_debug_log struct, by Sven Eckelmann

 - adopt netdev_hold() / netdev_put(), by Eric Dumazet

 - add support for jumbo frames, by Sven Eckelmann

 - use consistent name for mesh interface, by Sven Eckelmann

 - cleanup B.A.T.M.A.N. IV OGM aggregation handling,
   by Sven Eckelmann (4 patches)

 - add missing newlines for log macros, by Sven Eckelmann

* tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge:
  batman-adv: add missing newlines for log macros
  batman-adv: Limit aggregation size to outgoing MTU
  batman-adv: Use actual packet count for aggregated packets
  batman-adv: Switch to bitmap helper for aggregation handling
  batman-adv: Limit number of aggregated packets directly
  batman-adv: Use consistent name for mesh interface
  batman-adv: Add support for jumbo frames
  batman-adv: adopt netdev_hold() / netdev_put()
  batman-adv: Drop batadv_priv_debug_log struct
  batman-adv: Start new development cycle
====================

Link: https://patch.msgid.link/20250313164519.72808-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18 12:10:20 +01:00
Johannes Berg c924c5e9b8 Merge net-next/main to resolve conflicts
There are a few conflicts between the work that went
into wireless and that's here now, resolve them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18 09:46:36 +01:00
Mykyta Yatsenko 0de445d18e bpf: BPF token support for BPF_BTF_GET_FD_BY_ID
Currently BPF_BTF_GET_FD_BY_ID requires CAP_SYS_ADMIN, which does not
allow running it from user namespace. This creates a problem when
freplace program running from user namespace needs to query target
program BTF.
This patch relaxes capable check from CAP_SYS_ADMIN to CAP_BPF and adds
support for BPF token that can be passed in attributes to syscall.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250317174039.161275-2-mykyta.yatsenko5@gmail.com
2025-03-17 13:45:11 -07:00
Ilpo Järvinen 2c2f08d31d tcp: extend TCP flags to allow AE bit/ACE field
With AccECN, there's one additional TCP flag to be used (AE)
and ACE field that overloads the definition of AE, CWR, and
ECE flags. As tcp_flags was previously only 1 byte, the
byte-order stuff needs to be added to it's handling.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-17 13:49:46 +00:00
Kan Liang c53e14f1ea perf: Extend per event callchain limit to branch stack
The commit 97c79a38cd ("perf core: Per event callchain limit")
introduced a per-event term to allow finer tuning of the depth of
callchains to save space.

It should be applied to the branch stack as well. For example, autoFDO
collections require maximum LBR entries. In the meantime, other
system-wide LBR users may only be interested in the latest a few number
of LBRs. A per-event LBR depth would save the perf output buffer.

The patch simply drops the uninterested branches, but HW still collects
the maximum branches. There may be a model-specific optimization that
can reduce the HW depth for some cases to reduce the overhead further.
But it isn't included in the patch set. Because it's not useful for all
cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect
LBRs. The depth should have less impact on the collecting overhead.
The model-specific optimization may be implemented later separately.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250310181536.3645382-1-kan.liang@linux.intel.com
2025-03-17 11:23:36 +01:00
Ahmad Fatoum e016173f65 reboot: add support for configuring emergency hardware protection action
We currently leave the decision of whether to shutdown or reboot to
protect hardware in an emergency situation to the individual drivers.

This works out in some cases, where the driver detecting the critical
failure has inside knowledge: It binds to the system management controller
for example or is guided by hardware description that defines what to do.

In the general case, however, the driver detecting the issue can't know
what the appropriate course of action is and shouldn't be dictating the
policy of dealing with it.

Therefore, add a global hw_protection toggle that allows the user to
specify whether shutdown or reboot should be the default action when the
driver doesn't set policy.

This introduces no functional change yet as hw_protection_trigger() has no
callers, but these will be added in subsequent commits.

[arnd@arndb.de: hide unused hw_protection_attr]
  Link: https://lkml.kernel.org/r/20250224141849.1546019-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-7-e1c09b090c0c@pengutronix.de
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Cc: Benson Leung <bleung@chromium.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Fabio Estevam <festevam@denx.de>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matteo Croce <teknoraver@meta.com>
Cc: Matti Vaittinen <mazziesaccount@gmail.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 23:24:14 -07:00
Peilin Ye 880442305a bpf: Introduce load-acquire and store-release instructions
Introduce BPF instructions with load-acquire and store-release
semantics, as discussed in [1].  Define 2 new flags:

  #define BPF_LOAD_ACQ    0x100
  #define BPF_STORE_REL   0x110

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x100).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_STORE_REL (0x110).

Unlike existing atomic read-modify-write operations that only support
BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and
store-releases also support BPF_B (8-bit) and BPF_H (16-bit).  As an
exception, however, 64-bit load-acquires/store-releases are not
supported on 32-bit architectures (to fix a build error reported by the
kernel test robot).

An 8- or 16-bit load-acquire zero-extends the value before writing it to
a 32-bit register, just like ARM64 instruction LDARH and friends.

Similar to existing atomic read-modify-write operations, misaligned
load-acquires/store-releases are not allowed (even if
BPF_F_ANY_ALIGNMENT is set).

As an example, consider the following 64-bit load-acquire BPF
instruction (assuming little-endian):

  db 10 00 00 00 01 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x00000100): BPF_LOAD_ACQ

Similarly, a 16-bit BPF store-release:

  cb 21 00 00 10 01 00 00  store_release((u16 *)(r1 + 0x0), w2)

  opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
  imm (0x00000110): BPF_STORE_REL

In arch/{arm64,s390,x86}/net/bpf_jit_comp.c, have
bpf_jit_supports_insn(..., /*in_arena=*/true) return false for the new
instructions, until the corresponding JIT compiler supports them in
arena.

[1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/a217f46f0e445fbd573a1a024be5c6bf1d5fe716.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:28 -07:00
Yonghong Song 4b82b181a2 bpf: Allow pre-ordering for bpf cgroup progs
Currently for bpf progs in a cgroup hierarchy, the effective prog array
is computed from bottom cgroup to upper cgroups (post-ordering). For
example, the following cgroup hierarchy
    root cgroup: p1, p2
        subcgroup: p3, p4
have BPF_F_ALLOW_MULTI for both cgroup levels.
The effective cgroup array ordering looks like
    p3 p4 p1 p2
and at run time, progs will execute based on that order.

But in some cases, it is desirable to have root prog executes earlier than
children progs (pre-ordering). For example,
  - prog p1 intends to collect original pkt dest addresses.
  - prog p3 will modify original pkt dest addresses to a proxy address for
    security reason.
The end result is that prog p1 gets proxy address which is not what it
wants. Putting p1 to every child cgroup is not desirable either as it
will duplicate itself in many child cgroups. And this is exactly a use case
we are encountering in Meta.

To fix this issue, let us introduce a flag BPF_F_PREORDER. If the flag
is specified at attachment time, the prog has higher priority and the
ordering with that flag will be from top to bottom (pre-ordering).
For example, in the above example,
    root cgroup: p1, p2
        subcgroup: p3, p4
Let us say p2 and p4 are marked with BPF_F_PREORDER. The final
effective array ordering will be
    p2 p4 p3 p1

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250224230116.283071-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15 11:48:25 -07:00
Dionna Glaze b949f55644 crypto: ccp - Fix uAPI definitions of PSP errors
Additions to the error enum after explicit 0x27 setting for
SEV_RET_INVALID_KEY leads to incorrect value assignments.

Use explicit values to match the manufacturer specifications more
clearly.

Fixes: 3a45dc2b41 ("crypto: ccp: Define the SEV-SNP commands")
CC: stable@vger.kernel.org
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-15 16:21:22 +08:00
Binbin Wu 79462faa2b KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError>
Convert TDG.VP.VMCALL<ReportFatalError> to KVM_EXIT_SYSTEM_EVENT with
a new type KVM_SYSTEM_EVENT_TDX_FATAL and forward it to userspace for
handling.

TD guest can use TDG.VP.VMCALL<ReportFatalError> to report the fatal
error it has experienced.  This hypercall is special because TD guest
is requesting a termination with the error information, KVM needs to
forward the hypercall to userspace anyway, KVM doesn't do parsing or
conversion, it just dumps the 16 general-purpose registers to userspace
and let userspace decide what to do.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20250222014225.897298-8-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-03-14 14:20:55 -04:00
Niklas Cassel 2b48d3dcb7
PCI: endpoint: pcitest: Add IRQ_TYPE_* defines to UAPI header
These IRQ_TYPE_* defines are used by both drivers/misc/pci_endpoint_test.c
and tools/testing/selftests/pci_endpoint/pci_endpoint_test.c.

Considering that both the misc driver and the selftest already includes
the pcitest.h UAPI header, it makes sense for these IRQ_TYPE_* defines
to be located in the pcitest.h UAPI header.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250310111016.859445-10-cassel@kernel.org
2025-03-14 16:12:04 +00:00
Greg Kroah-Hartman a425990fa9 IIO: New device support, features and cleanup for the 6.15 cycle.
The usual mixture of new drivers, support in existing drivers for new
 devices, a range of features and general subsystem cleanup.
 
 Two merges of immutable branches in here:
 * SPI offload support. Culmination of a long effort to bring the ability
   to offload triggered sequences of SPI operations to specific hardware,
   allow high datarate acquisition over an SPI bus (if you have the right
   hardware / FPGA firmware)
 * GPIO set-array-helper - enables code simplification.
 
 New device support
 ==================
 
 adi,ad3552r-hs:
 - Add support for AD3541r and AD3542r via newly supported FPGA HDL.
 adi,ad4030
 - New driver supporting the AD4030, AD4630 AD4630-16, AD4640-24, AD4632-16,
   AD4632-24 1 and 2 channel high precision SPI ADCs.
 adi,ad4851
 - New driver and backend support for the AD4851, AD4852, AD4853, AD4854,
   AD4855, AD4846, AD4857, AD4858 and AD4858I high speed multichannel
   simultaneous sampling ADCs.
 adi,ad7191
 - New driver for this 24-bit ADC for precision bridge applications,
 adi,ad7380
 - Add support for the adaq4381-4 which is a 14-bit version of the
   already supported adaq4380-1
 adi,adis16550
 - New driver using the ADIS library (which needed extensions) for this
   IMU.
 brcm,apds9160
 - New driver for this proximity and ambient light sensor.
 dynaimage,al3000a
 - New driver for this illuminance sensor.
 mcube,mc3230
 - Add support for the mc3510c accelerometer with a different scale to existing
   supported parts (some rework preceded this)
 nxp,imx93
 - Add compatibles for imx94 and imx95 which are fully compatible with imx93.
 rockchip,saradc
 - Add support for the RK3528 ADC
 - Add support for the RK3562 ADC
 silab,si7210
 - New driver to support this I2C Hall effect magnetic position sensor.
 ti,ads7138
 - New driver supporting the ADS7128 and AD7138 I2C ADCs.
 
 Staging driver drop
 ===================
 
 adi,adis16240
 - Drop this impact sensor. Interesting part but complex hence never left
   staging due to ABI challenges. No longer readily available so drop driver.
 
 New features
 ============
 
 Documentation
 - A really nice overview document introduce ADC terminology and how
   it maps to IIO.
 core
 - New description for FAULT events, used in the ad7173.
 - filter_type ABI used in ad4130.
 buffer-dmaengine
 - Split DMA channel request from buffer allocation (for SPI offload)
 - Add a new _with_handle setup variant. (for SPI offload)
 adi,adf4371
 - Add control of reference clock type and support for frequency doubling
   where appropriate.
 adi,ad4695
 - Support SPI offload.
 - Support oversampling control.
 adi,ad5791
 - Support SPI offload.
 adi,ad7124
 - Add channel calibration support.
 adi,ad7380:
 - Alert support (threshold interrupts)
 - SPI offload support.
 adi,ad7606
 - Support writing registers when using backend enabling software control
   of modes.
 adi,ad7944
 - Support SPI offload.
 adi,ad9832
 - Use devm_regulator_get_enable() to simplify code.
 adi,ad9834
 - Use devm_regulator_get_enable() to simplify code.
 adi,adxl345
 - Improve IRQ handling code.
 - Add debug access to registers.
 bosch,bmi270
 - Add temperature channel support.
 - Add data ready trigger.
 google,cross_ec
 - Add trace events.
 mcube,mc3230
 - Add mount matrix support
 - Add an OF match table.
 
 Cleanup and minor bug fixes
 ===========================
 
 Tree wide:
 - Stop using iio_device_claim_direct_scoped() and introduce sparse friendly
   iio_device_claim/release_direct()
 
   The conditional scoped cleanup has proved hard to deal with, requiring
   workarounds for various compiler issues and in is rather non-intuitive
   so abandon that experiment. One of the attractions of that approach was
   that it made it much harder to have unbalanced   claim/release bugs so
   instead introduce a conditional-lock style boolean returning new pair
   of functions. These are inline in the header and have __acquire and
   __release calls allowing sparse to detect lack of balance.  There are
   occasional false positives but so far those have reflected complex code
   paths that benefited from cleanup anyway.
   The first set of driver conversions are in this pull request, more to
   follow next cycle. Various related cleanup in drivers.
   Removal of the _scoped code is completed and the definition removed.
 - Use of str_enable_disable() and similar helpers.
 - Don't set regmap cache to REGCACHE_NONE as that's the default anyway.
 - Change some caches from RBTREE to MAPLE reflecting best practice.
 - Use the new gpiod_multi_set_value_cansleep()
 - Make sure to grab direct mode for some calibrations paths.
 - Avoid using memcmp on structures when checking for matching channel configs.
   Instead just match field by field.
 dt-bindings:
 - Fix up indentation inconsistencies.
 gts-helper:
 - Simplify building of available scale table.
 adi,ad-sigma-delta
 - Make sure to disable channel after calibration done.
 - Add error handling in configuring channel during calibration.
 adi,ad2s1201
 - use a bitmap_write() rather than directly accessing underlying storage.
 adi,ad3552r-hs
 - Fix a wrong error message.
 - Make sure to use instruction mode for configuration.
 adi,ad4695
 - Add a conversion to ensure exit from conversion mode.
 - Use custom regmap to handle required sclk rate change.
 - Fix an out of bounds array access
 - Simplify oversampling ratio handling.
 adi,ad4851
 - Fix a sign bug.
 adi,ad5791
 - Fix wrong exported number of storage bits.
 adi,ad7124
 - Disable all channels at probe to avoid strange initial configurations.
 adi,ad7173
 - Rework to allow static const struct ad_sigma_delta without need
   to make a copy.
 adi,ad7623
 - Drop a BSD license tag that the authors consider unnecessary.
 adi,ad7768-1
 - Fix channels sign description exposed to user space.
 - Set MOSI idle state to avoid accidental device reset.
 - Avoid some overkill locking.
 adi,axi-dac
 - Check if device interface is busy when enabling data stream.
 - Add control of bus mode.
 bosch,bmi270
 - Move a struct definition to a c file as only used there.
 vishay,veml6030
 - Enable regmap cache to reduce bus traffic.
 - Fix ABI bug around scale reporting.
 vishay,vem6075
 - Check array bounds to harden against broken hardware.
 
 Various other minor tweaks and fixes not called out.
 
 *
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAmfTNbkRHGppYzIzQGtl
 cm5lbC5vcmcACgkQVIU0mcT0Foi41Q/+NTlPu1e9TqOqhXmpJc56EP5wnBaKuxMF
 pgPjKr1bbh6Th131kdeCjcTtUyMeLoLLPzcmSzPVVBOLanTHw9d5DQutg7+k0yUE
 8L0FEGEuSJeN6u6hnCFyKhOgu2xSPsCj1L99UCCsfXrvkUILSQgXSzoUkZDonmWj
 gDd/Zfn1Otxevxct9KG4dHhiy17f734Oxhjv3xyMwHWsUtHndAJiYeYRzRxjMJTh
 GeZHZMYvhnseLagAzfp3Dsqc5UCwm/Kmstz8wTlDW0kq/wbZqPwVLiC2xhPnTeds
 PqAL3m/n7Z1SYBa5ouKZYcn2pMfcYNsd3ZfbFMIX0gecGZZ6zRfSgulVDZKe7tp+
 cFbGhsIVWHt4TLWBevz6xihhyv/8ashi0PfIE1Gju8GZqqwwyR6wmdKJ1CNq3DmL
 W9+74w7xAbN+Fg9C/EbuSk/+uKXFrdf9w1NtpnPzKWVwnR0tFwavgpxRCJpu9xMn
 63nO/NCl+FtxibBBrpsTAb03hA4Fw595kiBBvkNSoNOYmoASj+w/bHhIFohZeteP
 pS3eAQlMRCH4ZCjRDNHIEwO9xoja5aqprhnGCNrVQYR+OCfUw5ELf6GwYpEsmvtt
 GyzgR92qxFBQz71h1/VyvAjxcaNRezO92VJajj+T2iJvj5lD6o/0+cscYWfEWEzn
 dxuaXrpF3ak=
 =4tID
 -----END PGP SIGNATURE-----

Merge tag 'iio-for-6.15a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next

Jonathan writes:

IIO: New device support, features and cleanup for the 6.15 cycle.

The usual mixture of new drivers, support in existing drivers for new
devices, a range of features and general subsystem cleanup.

Two merges of immutable branches in here:
* SPI offload support. Culmination of a long effort to bring the ability
  to offload triggered sequences of SPI operations to specific hardware,
  allow high datarate acquisition over an SPI bus (if you have the right
  hardware / FPGA firmware)
* GPIO set-array-helper - enables code simplification.

New device support
==================

adi,ad3552r-hs:
- Add support for AD3541r and AD3542r via newly supported FPGA HDL.
adi,ad4030
- New driver supporting the AD4030, AD4630 AD4630-16, AD4640-24, AD4632-16,
  AD4632-24 1 and 2 channel high precision SPI ADCs.
adi,ad4851
- New driver and backend support for the AD4851, AD4852, AD4853, AD4854,
  AD4855, AD4846, AD4857, AD4858 and AD4858I high speed multichannel
  simultaneous sampling ADCs.
adi,ad7191
- New driver for this 24-bit ADC for precision bridge applications,
adi,ad7380
- Add support for the adaq4381-4 which is a 14-bit version of the
  already supported adaq4380-1
adi,adis16550
- New driver using the ADIS library (which needed extensions) for this
  IMU.
brcm,apds9160
- New driver for this proximity and ambient light sensor.
dynaimage,al3000a
- New driver for this illuminance sensor.
mcube,mc3230
- Add support for the mc3510c accelerometer with a different scale to existing
  supported parts (some rework preceded this)
nxp,imx93
- Add compatibles for imx94 and imx95 which are fully compatible with imx93.
rockchip,saradc
- Add support for the RK3528 ADC
- Add support for the RK3562 ADC
silab,si7210
- New driver to support this I2C Hall effect magnetic position sensor.
ti,ads7138
- New driver supporting the ADS7128 and AD7138 I2C ADCs.

Staging driver drop
===================

adi,adis16240
- Drop this impact sensor. Interesting part but complex hence never left
  staging due to ABI challenges. No longer readily available so drop driver.

New features
============

Documentation
- A really nice overview document introduce ADC terminology and how
  it maps to IIO.
core
- New description for FAULT events, used in the ad7173.
- filter_type ABI used in ad4130.
buffer-dmaengine
- Split DMA channel request from buffer allocation (for SPI offload)
- Add a new _with_handle setup variant. (for SPI offload)
adi,adf4371
- Add control of reference clock type and support for frequency doubling
  where appropriate.
adi,ad4695
- Support SPI offload.
- Support oversampling control.
adi,ad5791
- Support SPI offload.
adi,ad7124
- Add channel calibration support.
adi,ad7380:
- Alert support (threshold interrupts)
- SPI offload support.
adi,ad7606
- Support writing registers when using backend enabling software control
  of modes.
adi,ad7944
- Support SPI offload.
adi,ad9832
- Use devm_regulator_get_enable() to simplify code.
adi,ad9834
- Use devm_regulator_get_enable() to simplify code.
adi,adxl345
- Improve IRQ handling code.
- Add debug access to registers.
bosch,bmi270
- Add temperature channel support.
- Add data ready trigger.
google,cross_ec
- Add trace events.
mcube,mc3230
- Add mount matrix support
- Add an OF match table.

Cleanup and minor bug fixes
===========================

Tree wide:
- Stop using iio_device_claim_direct_scoped() and introduce sparse friendly
  iio_device_claim/release_direct()

  The conditional scoped cleanup has proved hard to deal with, requiring
  workarounds for various compiler issues and in is rather non-intuitive
  so abandon that experiment. One of the attractions of that approach was
  that it made it much harder to have unbalanced   claim/release bugs so
  instead introduce a conditional-lock style boolean returning new pair
  of functions. These are inline in the header and have __acquire and
  __release calls allowing sparse to detect lack of balance.  There are
  occasional false positives but so far those have reflected complex code
  paths that benefited from cleanup anyway.
  The first set of driver conversions are in this pull request, more to
  follow next cycle. Various related cleanup in drivers.
  Removal of the _scoped code is completed and the definition removed.
- Use of str_enable_disable() and similar helpers.
- Don't set regmap cache to REGCACHE_NONE as that's the default anyway.
- Change some caches from RBTREE to MAPLE reflecting best practice.
- Use the new gpiod_multi_set_value_cansleep()
- Make sure to grab direct mode for some calibrations paths.
- Avoid using memcmp on structures when checking for matching channel configs.
  Instead just match field by field.
dt-bindings:
- Fix up indentation inconsistencies.
gts-helper:
- Simplify building of available scale table.
adi,ad-sigma-delta
- Make sure to disable channel after calibration done.
- Add error handling in configuring channel during calibration.
adi,ad2s1201
- use a bitmap_write() rather than directly accessing underlying storage.
adi,ad3552r-hs
- Fix a wrong error message.
- Make sure to use instruction mode for configuration.
adi,ad4695
- Add a conversion to ensure exit from conversion mode.
- Use custom regmap to handle required sclk rate change.
- Fix an out of bounds array access
- Simplify oversampling ratio handling.
adi,ad4851
- Fix a sign bug.
adi,ad5791
- Fix wrong exported number of storage bits.
adi,ad7124
- Disable all channels at probe to avoid strange initial configurations.
adi,ad7173
- Rework to allow static const struct ad_sigma_delta without need
  to make a copy.
adi,ad7623
- Drop a BSD license tag that the authors consider unnecessary.
adi,ad7768-1
- Fix channels sign description exposed to user space.
- Set MOSI idle state to avoid accidental device reset.
- Avoid some overkill locking.
adi,axi-dac
- Check if device interface is busy when enabling data stream.
- Add control of bus mode.
bosch,bmi270
- Move a struct definition to a c file as only used there.
vishay,veml6030
- Enable regmap cache to reduce bus traffic.
- Fix ABI bug around scale reporting.
vishay,vem6075
- Check array bounds to harden against broken hardware.

Various other minor tweaks and fixes not called out.

*

* tag 'iio-for-6.15a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (223 commits)
  doc: iio: ad7380: describe offload support
  iio: ad7380: add support for SPI offload
  iio: light: Add check for array bounds in veml6075_read_int_time_ms
  iio: adc: ti-ads7924 Drop unnecessary function parameters
  staging: iio: ad9834: Use devm_regulator_get_enable()
  staging: iio: ad9832: Use devm_regulator_get_enable()
  iio: gyro: bmg160_spi: add of_match_table
  dt-bindings: iio: adc: Add i.MX94 and i.MX95 support
  iio: adc: ad7768-1: remove unnecessary locking
  Documentation: ABI: add wideband filter type to sysfs-bus-iio
  iio: adc: ad7768-1: set MOSI idle state to prevent accidental reset
  iio: adc: ad7768-1: Fix conversion result sign
  iio: adc: ad7124: Benefit of dev = indio_dev->dev.parent in ad7124_parse_channel_config()
  iio: adc: ad7124: Implement system calibration
  iio: adc: ad7124: Implement internal calibration at probe time
  iio: adc: ad_sigma_delta: Add error checking for ad_sigma_delta_set_channel()
  iio: adc: ad4130: Adapt internal names to match official filter_type ABI
  iio: adc: ad7173: Fix comparison of channel configs
  iio: adc: ad7124: Fix comparison of channel configs
  iio: adc: ad4130: Fix comparison of channel setups
  ...
2025-03-14 07:15:12 +01:00
Thomas Gleixner ec2d0c0462 posix-timers: Provide a mechanism to allocate a given timer ID
Checkpoint/Restore in Userspace (CRIU) requires to reconstruct posix timers
with the same timer ID on restore. It uses sys_timer_create() and relies on
the monotonic increasing timer ID provided by this syscall. It creates and
deletes timers until the desired ID is reached. This is can loop for a long
time, when the checkpointed process had a very sparse timer ID range.

It has been debated to implement a new syscall to allow the creation of
timers with a given timer ID, but that's tideous due to the 32/64bit compat
issues of sigevent_t and of dubious value.

The restore mechanism of CRIU creates the timers in a state where all
threads of the restored process are held on a barrier and cannot issue
syscalls. That means the restorer task has exclusive control.

This allows to address this issue with a prctl() so that the restorer
thread can do:

   if (prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_ON))
      goto linear_mode;
   create_timers_with_explicit_ids();
   prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_OFF);
   
This is backwards compatible because the prctl() fails on older kernels and
CRIU can fall back to the linear timer ID mechanism. CRIU versions which do
not know about the prctl() just work as before.

Implement the prctl() and modify timer_create() so that it copies the
requested timer ID from userspace by utilizing the existing timer_t
pointer, which is used to copy out the allocated timer ID on success.

If the prctl() is disabled, which it is by default, timer_create() works as
before and does not try to read from the userspace pointer.

There is no problem when a broken or rogue user space application enables
the prctl(). If the user space pointer does not contain a valid ID, then
timer_create() fails. If the data is not initialized, but constains a
random valid ID, timer_create() will create that random timer ID or fail if
the ID is already given out. 
 
As CRIU must use the raw syscall to avoid manipulating the internal state
of the restored process, this has no library dependencies and can be
adopted by CRIU right away.

Recreating two timers with IDs 1000000 and 2000000 takes 1.5 seconds with
the create/delete method. With the prctl() it takes 3 microseconds.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Link: https://lore.kernel.org/all/87jz8vz0en.ffs@tglx
2025-03-13 12:07:18 +01:00
Dave Airlie 626fb11566 Linux 6.14-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfOKBUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1aQH/iC+Oyij4VxAjBek
 BOXIT/p6CwlIXb8ObiWWcRjDPizlcxb3RaV8J2RO+IqaQ2wltxpFANq2G7Re2FPm
 SNcEpIURAOVcxHGedcfFA91srO5F4FzNTO8LVp7MIbcgMYy3pdk+dbZmi6A691R+
 t9pb74m+MAnF1o/MUx7pUlhAT/4ymuuR0F7WCSg4h0Xwe5m0nlJY89kJBC7PCjyd
 n3mdhsz3rDSLmt/z/T7HGD89r8sYSvm9cOKtL3ELgGTrm7boQV8ii9Y9w04DI8PQ
 JmIernugcCxmhH36mVUAHgJf2+/T388xFUh/D5+skeUOUZpaJZG866rnb32WpsHc
 eWLFUeg=
 =Wypt
 -----END PGP SIGNATURE-----

Backmerge tag 'v6.14-rc6' into drm-next

This is a backmerge from Linux 6.14-rc6, needed for the nova PR.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-03-12 09:43:12 +10:00
Greg Kroah-Hartman 34ff7999dc Counter updates for 6.15
counter:
  - Introduce the COUNTER_EVENT_DIRECTION_CHANGE event
  - Introduce the COUNTER_COMP_COMPARE helper macro
 microchip-tcb-cpature:
  - Add IRQ handling
  - Add support for capture extensions
  - Add support for compare extension
 ti-eqep:
  - Add support for reading and detecting changes in direction
 tools/counter:
  - Add counter_watch_events executable to .gitignore
  - Support COUNTER_EVENT_DIRECTION_CHANGE in counter_watch_events tool
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSNN83d4NIlKPjon7a1SFbKvhIjKwUCZ8+ECgAKCRC1SFbKvhIj
 K/i+AQDS3TVUcot2C+NsvNKud2hA2Q7IIIsAsorQgSdjnx9wDAD8CfQvuvEr6JhZ
 WuWxw8L59xJAYlInQjpP89pZQA4g+Ao=
 =b0Dk
 -----END PGP SIGNATURE-----

Merge tag 'counter-updates-for-6.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wbg/counter into char-misc-next

William writes:

Counter updates for 6.15

counter:
 - Introduce the COUNTER_EVENT_DIRECTION_CHANGE event
 - Introduce the COUNTER_COMP_COMPARE helper macro
microchip-tcb-cpature:
 - Add IRQ handling
 - Add support for capture extensions
 - Add support for compare extension
ti-eqep:
 - Add support for reading and detecting changes in direction
tools/counter:
 - Add counter_watch_events executable to .gitignore
 - Support COUNTER_EVENT_DIRECTION_CHANGE in counter_watch_events tool

* tag 'counter-updates-for-6.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wbg/counter:
  counter: microchip-tcb-capture: Add support for RC Compare
  counter: Introduce the compare component
  counter: microchip-tcb-capture: Add capture extensions for registers RA/RB
  counter: microchip-tcb-capture: Add IRQ handling
  counter: ti-eqep: add direction support
  tools/counter: add direction change event to watcher
  counter: add direction change event
  tools/counter: gitignore counter_watch_events
2025-03-11 11:01:57 +01:00
Johannes Berg cf12d3d71e wifi: cfg80211: improve supported_selector documentation
Improve the documentation for supported BSS selectors to make it more
precise.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250308225541.ba402ff47314.I502b56111b62ea0be174ae76bd03684ae1d4aefb@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-11 10:53:10 +01:00
Anjaneyulu cf4bd16088 wifi: cfg80211: allow IR in 20 MHz configurations
Some regulatory bodies doesn't allow IR (initiate radioation) on a
specific subband, but allows it for channels with a bandwidth of 20 MHz.
Add a channel flag that indicates that, and consider it in
cfg80211_reg_check_beaconing.

While on it, fix the kernel doc of enum nl80211_reg_rule_flags and
change it to use BIT().

Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
Co-developed-by: Somashekhar Puttagangaiah <somashekhar.puttagangaiah@intel.com>
Signed-off-by: Somashekhar Puttagangaiah <somashekhar.puttagangaiah@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250308225541.d3ab352a73ff.I8a8f79e1c9eb74936929463960ee2a324712fe51@changeid
[fix typo]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-11 10:53:01 +01:00
Johannes Berg 969241371f wifi: cfg80211: allow setting extended MLD capa/ops
Some extended MLD capabilities and operations bits (currently
the "BTM MLD Recommendataion For Multiple APs Support" bit)
may depend on userspace capabilities. Allow userspace to pass
the values for this field that it supports to the association
and link reconfiguration operations.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Link: https://patch.msgid.link/20250308225541.bd52078b5f65.I4dd8f53b0030db7ea87a2e0920989e7e2c7b5345@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-11 10:51:59 +01:00
Jeff Layton 9a28ac1762 lockd: add netlink control interface
The legacy rpc.nfsd tool will set the nlm_grace_period if the NFSv4
grace period is set. nfsdctl is missing this functionality, so add a new
netlink control interface for lockd that it can use. For now, it only
allows setting the grace period, and the tcp and udp listener ports.

lockd currently uses module parameters and sysctls for configuration, so
all of its settings are global. With this change, lockd now tracks these
values on a per-net-ns basis. It will only fall back to using the global
values if any of them are 0.

Finally, as a backward compatibility measure, if updating the nlm
settings in the init_net namespace, also update the legacy global
values to match.

Link: https://issues.redhat.com/browse/RHEL-71698
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:53 -04:00
Greg Kroah-Hartman 525b139fb4 Merge v6.14-rc6 into usb-next
Resolves the merge conflict with:
	drivers/usb/typec/ucsi/ucsi_acpi.c

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-10 08:16:31 +01:00
Dave Airlie 236f475d29 amdgpu:
- Fix spelling typos
 - RAS updates
 - VCN 5.0.1 updates
 - SubVP fixes
 - DCN 4.0.1 fixes
 - MSO DPCD fixes
 - DIO encoder refactor
 - PCON fixes
 - Misc cleanups
 - DMCUB fixes
 - USB4 DP fixes
 - DM cleanups
 - Backlight cleanups and fixes
 - Support platform backlight curves
 - Misc code cleanups
 - SMU 14 fixes
 - JPEG 4.0.3 reset updates
 - SR-IOV fixes
 - SVM fixes
 - GC 12 DCC fixes
 - DC DCE 6.x fix
 - Hiberation fix
 
 amdkfd:
 - Fix possible NULL pointer in queue validation
 - Remove unnecessary CP domain validation
 - SDMA queue reset support
 - Add per process flags
 
 radeon:
 - Fix spelling typos
 - RS400 hyperZ fix
 
 UAPI:
 - Add KFD per process flags for setting precision
   Proposed user space: 2a64fa5e06
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ8tfmgAKCRC93/aFa7yZ
 2CmkAP4xoy09aHecz6WOdNN9VBJiSMSZ3UOyBL9Ll2i5nI3YegEA7Co+AmQpXOPn
 HKeD6Vy1tt1XE/Liuur5dbLUKsVYSAI=
 =IQ45
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.15-2025-03-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amdgpu:
- Fix spelling typos
- RAS updates
- VCN 5.0.1 updates
- SubVP fixes
- DCN 4.0.1 fixes
- MSO DPCD fixes
- DIO encoder refactor
- PCON fixes
- Misc cleanups
- DMCUB fixes
- USB4 DP fixes
- DM cleanups
- Backlight cleanups and fixes
- Support platform backlight curves
- Misc code cleanups
- SMU 14 fixes
- JPEG 4.0.3 reset updates
- SR-IOV fixes
- SVM fixes
- GC 12 DCC fixes
- DC DCE 6.x fix
- Hiberation fix

amdkfd:
- Fix possible NULL pointer in queue validation
- Remove unnecessary CP domain validation
- SDMA queue reset support
- Add per process flags

radeon:
- Fix spelling typos
- RS400 hyperZ fix

UAPI:
- Add KFD per process flags for setting precision
  Proposed user space: 2a64fa5e06

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250307211051.1880472-1-alexander.deucher@amd.com
2025-03-10 09:04:52 +10:00
Bence Csókás 1adc6240a8 counter: microchip-tcb-capture: Add capture extensions for registers RA/RB
TCB hardware is capable of capturing the timer value to registers RA and
RB. Add these registers as capture extensions.

Signed-off-by: Bence Csókás <csokas.bence@prolan.hu>
Link: https://lore.kernel.org/r/20250306134441.582819-3-csokas.bence@prolan.hu
Signed-off-by: William Breathitt Gray <wbg@kernel.org>
2025-03-08 08:57:18 +09:00
Bence Csókás e5d5813968 counter: microchip-tcb-capture: Add IRQ handling
Add interrupt servicing to allow userspace to wait for the following:
* Change-of-state caused by external trigger
* Capture of timer value into RA/RB
* Compare to RC register
* Overflow

Signed-off-by: Bence Csókás <csokas.bence@prolan.hu>
Link: https://lore.kernel.org/r/20250306134441.582819-2-csokas.bence@prolan.hu
Signed-off-by: William Breathitt Gray <wbg@kernel.org>
2025-03-08 08:57:02 +09:00
Harish Kasiviswanathan cf6d949a40 drm/amdkfd: Add support for more per-process flag
Add support for more per-process flags starting with option to configure
MFMA precision for gfx 9.5

v2: Change flag name to KFD_PROC_FLAG_MFMA_HIGH_PRECISION
    Remove unused else condition
v3: Bump the KFD API version
v4: Missed SH_MEM_CONFIG__PRECISION_MODE__SHIFT define. Added it.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07 15:33:49 -05:00
Timur Tabi b0db1ed176 elf: add remaining SHF_ flag macros
Add the remaining SHF_ flags, as listed in the "Executable and
Linkable Format" Wikipedia page and the System V Application Binary
Interface[1]. This allows drivers to load and parse ELF images that use
some of those flags.

In particular, an upcoming change to the Nouveau GPU driver will
use some of the flags.

Link: https://refspecs.linuxfoundation.org/elf/gabi4+/ch4.sheader.html#sh_flags [1]
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Link: https://lore.kernel.org/r/20250307171417.267488-1-ttabi@nvidia.com
Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-07 12:20:28 -08:00
Zhiyuan Dai 5af473941b PCI: Increase Resizable BAR support from 512 GB to 128 TB
Per PCIe r6.0, sec 7.8.6.2, devices can advertise Resizable BAR sizes up to
128 TB in the Resizable BAR Capability register.  Larger sizes can be
advertised via the Capability register, but that requires an API change.

Update pci_rebar_get_possible_sizes() and pbus_size_mem() to increase the
sizes we currently support from 512 GB to 128 TB.

Link: https://lore.kernel.org/r/20250307053535.44918-1-daizhiyuan@phytium.com.cn
Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-07 11:23:34 -06:00
Pavel Begunkov bdabba04bb io_uring/rw: implement vectored registered rw
Implement registered buffer vectored reads with new opcodes
IORING_OP_WRITEV_FIXED and IORING_OP_READV_FIXED.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d7c89eb481e870f598edc91cc66ff4d1e4ae3788.1741362889.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07 09:07:29 -07:00
Jens Axboe 6e3da40ed6 Merge branch 'for-6.15/io_uring-epoll-wait' into for-6.15/io_uring-reg-vec
* for-6.15/io_uring-epoll-wait:
  io_uring/epoll: add support for IORING_OP_EPOLL_WAIT
  io_uring/epoll: remove CONFIG_EPOLL guards
  eventpoll: add epoll_sendevents() helper
  eventpoll: abstract out ep_try_send_events() helper
  eventpoll: abstract out parameter sanity checking
2025-03-07 09:07:19 -07:00
Jens Axboe 78b6f6e9bf Merge branch 'for-6.15/io_uring-rx-zc' into for-6.15/io_uring-reg-vec
* for-6.15/io_uring-rx-zc: (80 commits)
  io_uring/zcrx: add selftest case for recvzc with read limit
  io_uring/zcrx: add a read limit to recvzc requests
  io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
  io_uring: Rename KConfig to Kconfig
  io_uring/zcrx: fix leaks on failed registration
  io_uring/zcrx: recheck ifq on shutdown
  io_uring/zcrx: add selftest
  net: add documentation for io_uring zcrx
  io_uring/zcrx: add copy fallback
  io_uring/zcrx: throttle receive requests
  io_uring/zcrx: set pp memory provider for an rx queue
  io_uring/zcrx: add io_recvzc request
  io_uring/zcrx: dma-map area for the device
  io_uring/zcrx: implement zerocopy receive pp memory provider
  io_uring/zcrx: grab a net device
  io_uring/zcrx: add io_zcrx_area
  io_uring/zcrx: add interface queue and refill queue
  net: add helpers for setting a memory provider on an rx queue
  net: page_pool: add memory provider helpers
  net: prepare for non devmem TCP memory providers
  ...
2025-03-07 09:07:11 -07:00
Jens Axboe 94765d71a0 Merge branch 'for-6.15/io_uring' into for-6.15/io_uring-reg-vec
* for-6.15/io_uring: (80 commits)
  io_uring: introduce io_cache_free() helper
  io_uring/rsrc: skip NULL file/buffer checks in io_free_rsrc_node()
  io_uring/rsrc: avoid NULL node check on io_sqe_buffer_register() failure
  io_uring/rsrc: call io_free_node() on io_sqe_buffer_register() failure
  io_uring/rsrc: free io_rsrc_node using kfree()
  io_uring/rsrc: split out io_free_node() helper
  io_uring/rsrc: include io_uring_types.h in rsrc.h
  ublk: don't cast registered buffer index to int
  io_uring/nop: use io_find_buf_node()
  io_uring/rsrc: declare io_find_buf_node() in header file
  io_uring/ublk: report error when unregister operation fails
  io_uring: convert cmd_to_io_kiocb() macro to function
  io_uring/uring_cmd: specify io_uring_cmd_import_fixed() pointer type
  io_uring/rsrc: use rq_data_dir() to compute bvec dir
  selftests: ublk: add ublk zero copy test
  selftests: ublk: add file backed ublk
  selftests: ublk: add kernel selftests for ublk
  io_uring: cache nodes and mapped buffers
  ublk: zc register/unregister bvec
  io_uring: add support for kernel registered bvecs
  ...
2025-03-07 09:07:07 -07:00
Alistair Francis f810d17762 PCI/DOE: Rename Discovery Response Data Object Contents to type
PCIe r6.1, sec 6.30.1.1, describes a "Vendor ID", a "Data Object Type" and
"Next Index" as the fields in the DOE Discovery Response Data Object. The
DOE driver currently uses both the terms 'type' and 'prot' for the second
element.

Rename all uses of the DOE Discovery Response Data Object to use 'type' as
the second element of the object header, instead of type/prot as it
currently is.

Link: https://lore.kernel.org/r/20250306075211.1855177-2-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-06 12:48:16 -06:00
Vincent Mailhol 0312e94abe treewide: fix typo 'unsigned __init128' -> 'unsigned __int128'
"int" was misspelled as "init" the code comments in the bits.h and
const.h files. Fix the typo.

CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2025-03-05 12:00:03 -05:00
Jonathan Kim ceb7114c96 drm/amdkfd: flag per-sdma queue reset supported to user space
Similar to compute queue reset, flag SDMA queue reset capabilities to
user space for safe testing.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05 10:47:33 -05:00
Christian Brauner 7477d7dce4
pidfs: allow to retrieve exit information
Some tools like systemd's jounral need to retrieve the exit and cgroup
information after a process has already been reaped. This can e.g.,
happen when retrieving a pidfd via SCM_PIDFD or SCM_PEERPIDFD.

Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-6-c8c3d8361705@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 13:26:17 +01:00
Jakub Kicinski 71f8992e34 First 6.15 material:
* cfg80211/mac80211
    - remove cooked monitor support
    - strict mode for better AP testing
    - basic EPCS support
    - OMI RX bandwidth reduction support
  * rtw88
    - preparation for RTL8814AU support
  * rtw89
    - use wiphy_lock/wiphy_work
    - preparations for MLO
    - BT-Coex improvements
    - regulatory support in firmware files
  * iwlwifi
    - preparations for the new iwlmld sub-driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmfG9tQACgkQ10qiO8sP
 aAAE1g/9Fg6vrTY1OX0at1iXL5AMqtfAVUJffHGz8SyWJNzKgP0pLtXUTi1Ta2b4
 TmHG0+4fMcwM0nigL0cg/F6vEE5V7qM30b7WMhIKpBejcCklINEIYuB1a6CwfDNF
 7QtE/NA9/oKBy2JftYpm+4mdtlKdwBdkpz3MKE8/BpB1HdmG3IS3/z9+t4jkMOv+
 hRxZKRlqpW98uimc5JjkjMGyt/e4QyHiUt0Z7Ly3dzm0PajvmRy4Idkl3TFMi4Yp
 GgKFTix9/7G4iYFoyK5iNiqfesnzqY6jaI3sxv6h6AG/Yz8EOkBNNnKovPC0lAo/
 ZpkhU63XN10SZXxeSDSiz4tWx3uBxpN8PIRvuaEr+KW7caCjMYXAuB5uE+KvREA3
 hkM8zB5IkT54bLaHJYG6TGQoWxRnzoEDjURBHkEG9DfJs2ixKxdLfe/Rc1lvLqt4
 t5ejc+UQsjICZ31LhrZkNbfmKLtQfFCcBj3ZUHz9RzO1AXmsqudeRcl0j3rDjsqY
 AdYKFEKGCKeBlhr+Cng9VqzJa7gCmgVgkXYjGxJgbL0n3u6oV/S08/7Ssbzf7RXY
 ulFqxEjaexn/vtUHVwMOTxyDttftOfB3Y1IrNUJj3OXWpuXK247OQN6QHM9XWtVy
 xzBOW3g9N0L3EUM4PVmZTGU4SdtuwNxvZbSrToBfnjVu0rESAT4=
 =KYXQ
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2025-03-04-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
First 6.15 material:
 * cfg80211/mac80211
   - remove cooked monitor support
   - strict mode for better AP testing
   - basic EPCS support
   - OMI RX bandwidth reduction support
 * rtw88
   - preparation for RTL8814AU support
 * rtw89
   - use wiphy_lock/wiphy_work
   - preparations for MLO
   - BT-Coex improvements
   - regulatory support in firmware files
 * iwlwifi
   - preparations for the new iwlmld sub-driver

* tag 'wireless-next-2025-03-04-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (128 commits)
  wifi: iwlwifi: remove mld/roc.c
  wifi: mac80211: refactor populating mesh related fields in sinfo
  wifi: cfg80211: reorg sinfo structure elements for mesh
  wifi: iwlwifi: Fix spelling mistake "Increate" -> "Increase"
  wifi: iwlwifi: add Debug Host Command APIs
  wifi: iwlwifi: add IWL_MAX_NUM_IGTKS macro
  wifi: iwlwifi: add OMI bandwidth reduction APIs
  wifi: iwlwifi: remove mvm prefix from iwl_mvm_d3_end_notif
  wifi: iwlwifi: remember if the UATS table was read successfully
  wifi: iwlwifi: export iwl_get_lari_config_bitmap
  wifi: iwlwifi: add support for external 32 KHz clock
  wifi: iwlwifi: mld: add a debug level for EHT prints
  wifi: iwlwifi: mld: add a debug level for PTP prints
  wifi: iwlwifi: remove mvm prefix from iwl_mvm_esr_mode_notif
  wifi: iwlwifi: use 0xff instead of 0xffffffff for invalid
  wifi: iwlwifi: location api cleanup
  wifi: cfg80211: expose update timestamp to drivers
  wifi: mac80211: add ieee80211_iter_chan_contexts_mtx
  wifi: mac80211: fix integer overflow in hwmp_route_info_get()
  wifi: mac80211: Fix possible integer promotion issue
  ...
====================

Link: https://patch.msgid.link/20250304125605.127914-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04 08:50:42 -08:00
Mauro Carvalho Chehab 62d6d20257 drivers: firewire: firewire-cdev.h: fix identation on a kernel-doc markup
The description of @tstamp parameter has one line that starts at the
beginning. This moves such line to the description, which is not the
intent here.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/8238bed1c0375e6b389a8cafe1ad99fdeb1cb1f2.1740387599.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2025-03-04 09:47:36 -07:00
Nicolas Dichtel 4754affe0b net: advertise netns_immutable property via netlink
Since commit 05c1280a2b ("netdev_features: convert NETIF_F_NETNS_LOCAL to
dev->netns_local"), there is no way to see if the netns_immutable property
s set on a device. Let's add a netlink attribute to advertise it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04 12:44:48 +01:00
Thomas Weißschuh e0d15896f5 elf, uapi: Add types ElfXX_Verdef and ElfXX_Veraux
The types are used by tools/testing/selftests/vDSO/parse_vdso.c.

To be able to build the vDSO selftests without a libc dependency,
add the types to the kernels own UAPI headers.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html#VERDEFEXTS
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-6-28e14e031ed8@linutronix.de
2025-03-03 20:00:12 +01:00
Thomas Weißschuh 2c86f604f8 elf, uapi: Add type ElfXX_Versym
The type is used by tools/testing/selftests/vDSO/parse_vdso.c.

To be able to build the vDSO selftests without a libc dependency,
add the type to the kernels own UAPI headers. As documented by elf(5).

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-5-28e14e031ed8@linutronix.de
2025-03-03 20:00:11 +01:00
Thomas Weißschuh 049d19bb38 elf, uapi: Add definitions for VER_FLG_BASE and VER_FLG_WEAK
The definitions are used by tools/testing/selftests/vDSO/parse_vdso.c.

To be able to build the vDSO selftests without a libc dependency,
add the definitions to the kernels own UAPI headers.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-80869/index.html
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-4-28e14e031ed8@linutronix.de
2025-03-03 20:00:11 +01:00
Thomas Weißschuh 50881d1469 elf, uapi: Add definition for DT_GNU_HASH
The definition is used by tools/testing/selftests/vDSO/parse_vdso.c.

To be able to build the vDSO selftests without a libc dependency,
add the define to the kernels own UAPI headers.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/libc-ddefs.html
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-3-28e14e031ed8@linutronix.de
2025-03-03 20:00:11 +01:00
Thomas Weißschuh c413114096 elf, uapi: Add definition for STN_UNDEF
The definition is used by tools/testing/selftests/vDSO/parse_vdso.c.

To be able to build the vDSO selftests without a libc dependency,
add the definition to the kernels own UAPI headers.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://refspecs.linuxfoundation.org/elf/gabi4+/ch4.symtab.html
Link: https://lore.kernel.org/all/20250226-parse_vdso-nolibc-v2-2-28e14e031ed8@linutronix.de
2025-03-03 20:00:11 +01:00
Ming Lei e84025d2a9 ublk: add DMA alignment limit
The in-tree ublk driver doesn't need DMA alignment limit because there
is one data copy between request pages and the userspace buffer.

However, ublk is going to support zero copy, then DMA alignment limit
is required, because same IO buffer is forwarded to backend which may
have specific buffer DMA alignment limit, so the limit has to be exposed
from the frontend driver to client application.

Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250227103707.2640014-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-03 11:17:52 -07:00
Yunke Cao 2dc768d71b media: uvcvideo: implement UVC v1.5 ROI
Implement support for ROI as described in UVC 1.5:
4.2.2.1.20 Digital Region of Interest (ROI) Control

ROI control is implemented using V4L2 control API as
two UVC-specific controls:
V4L2_CID_UVC_REGION_OF_INTEREST_RECT and
V4L2_CID_UVC_REGION_OF_INTEREST_AUTO.

Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Yunke Cao <yunkec@google.com>
Reviewed-by: Yunke Cao <yunkec@google.com>
Tested-by: Yunke Cao <yunkec@google.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://lore.kernel.org/r/20250203-uvc-roi-v17-16-5900a9fed613@chromium.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
[hverkuil: fix control names: "Of" -> "of", "Controls" -> "Ctrls"]
2025-03-03 18:23:36 +01:00
Hans Verkuil a5bd42aafb media: v4l2-ctrls: add support for V4L2_CTRL_WHICH_MIN/MAX_VAL
Add the capability of retrieving the min and max values of a
compound control.

[Ricardo: Added static to v4l2_ctrl_type_op_(maximum|minimum) proto]
[Ricardo: Fix documentation]

Signed-off-by: Yunke Cao <yunkec@google.com>
Tested-by: Yunke Cao <yunkec@google.com>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://lore.kernel.org/r/20250203-uvc-roi-v17-2-5900a9fed613@chromium.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
[hverkuil: fix small alignment checkpatch warning]
2025-03-03 18:23:35 +01:00
Yunke Cao 3b9d7340cf media: v4l2_ctrl: Add V4L2_CTRL_TYPE_RECT
Add p_rect to struct v4l2_ext_control with basic support in
v4l2-ctrls.

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Signed-off-by: Yunke Cao <yunkec@google.com>
Reviewed-by: Hans Verkuil <hverkuil@xs4all.nl>
Tested-by: Yunke Cao <yunkec@google.com>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://lore.kernel.org/r/20250203-uvc-roi-v17-1-5900a9fed613@chromium.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-03-03 18:23:35 +01:00
Kees Cook 4c2d8a6a54 nilfs2: Mark on-disk strings as nonstring
In preparation for memtostr*() checking that its source is marked as
nonstring, annotate the device strings accordingly using the new UAPI
alias for the "nonstring" attribute.

Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28 11:51:32 -08:00
Kees Cook 3407caa69a uapi: stddef.h: Introduce __kernel_nonstring
In order to annotate byte arrays in UAPI that are not C strings (i.e.
they may not be NUL terminated), the "nonstring" attribute is needed.
However, we can't expose this to userspace as it is compiler version
specific.

Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28 11:51:32 -08:00
I Hsin Cheng 1e7933a575 uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"
This patch reverts 'commit c32ee3d9abd2("bitops: avoid integer overflow in
 GENMASK(_ULL)")'.

The code generation can be shrink by over 1KB by reverting this commit.
Originally the commit claimed that clang would emit warnings using the
implementation at that time.

The patch was applied and tested against numerous compilers, including
gcc-13, gcc-12, gcc-11 cross-compiler, clang-17, clang-18 and clang-19.
Various warning levels were set (-W=0, -W=1, -W=2) and CONFIG_WERROR
disabled to complete the compilation. The results show that no compilation
errors or warnings were generated due to the patch.

The results of code size reduction are summarized in the following table.
The code size changes for clang are all zero across different versions,
so they're not listed in the table.

For NR_CPUS=64 on x86_64.
----------------------------------------------
|	        |   gcc-13 |   gcc-12 |   gcc-11 |
----------------------------------------------
|       old | 22438085 | 22453915 | 22302033 |
----------------------------------------------
|       new | 22436816 | 22452913 | 22300826 |
----------------------------------------------
| new - old |    -1269 |    -1002 |    -1207 |
----------------------------------------------

For NR_CPUS=1024 on x86_64.
----------------------------------------------
|	        |   gcc-13 |   gcc-12 |   gcc-11 |
----------------------------------------------
|       old | 22493682 | 22509812 | 22357661 |
----------------------------------------------
|       new | 22493230 | 22509487 | 22357250 |
----------------------------------------------
| new - old |     -452 |     -325 |     -411 |
----------------------------------------------

For arm64 architecture, gcc cross-compiler was used and QEMU was
utilized to execute a VM for a CPU-heavy workload to ensure no
side effects and that functionalities remained correct. The test
even demonstrated a positive result in terms of code size reduction:
* Before: 31660668
* After: 31658724
* Difference (After - Before): -1944

An analysis of multiple functions compiled with gcc-13 on x86_64 was
performed. In summary, the patch elimates one negation in almost
every use case. However, negative effects may occur in some cases,
such as the generation of additional "mov" instruction or increased
register usage. The use of "~_UL(0) << (l)" may even result in the
allocations of "%r*" registers instead of "%e*" registers (which are
32-bit registers) because the compiler cannot assume that the higher
bits are zero.

Yury:

We limit GENMASK() usage with the const_true(l > h) condition, and
most of users just call it with constant parameters. For those, the
actual implementation of the macro doesn't matter, and since it
triggered clang warnings back then, it was reasonable to workaround
the warnings on the kernel side.

Now that some find_bit() functions call GENMASK() with runtime
parameters (although the const_true() condition holds), this ended up
hurting the generated code, as I Hsin discovered. This is especially
bad because it hurts small_const_nbits() optimization, where people are
most concerned about generated code quality. So, revert it to the
original version for good.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
2025-02-28 12:36:11 -05:00
Keith Busch 1f6540e2aa ublk: zc register/unregister bvec
Provide new operations for the user to request mapping an active request
to an io uring instance's buf_table. The user has to provide the index
it wants to install the buffer.

A reference count is taken on the request to ensure it can't be
completed while it is active in a ring's buf_table.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250227223916.143006-6-kbusch@meta.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28 07:05:41 -07:00
Daniel Borkmann e1f95b1992 geneve: Allow users to specify source port range
Recently, in case of Cilium, we run into users on Azure who require to use
tunneling for east/west traffic due to hitting IPAM API limits for Kubernetes
Pods if they would have gone with publicly routable IPs for Pods. In case
of tunneling, Cilium supports the option of vxlan or geneve. In order to
RSS spread flows among remote CPUs both derive a source port hash via
udp_flow_src_port() which takes the inner packet's skb->hash into account.
For clusters with many nodes, this can then hit a new limitation [0]: Today,
the Azure networking stack supports 1M total flows (500k inbound and 500k
outbound) for a VM. [...] Once this limit is hit, other connections are
dropped. [...] Each flow is distinguished by a 5-tuple (protocol, local IP
address, remote IP address, local port, and remote port) information. [...]

For vxlan and geneve, this can create a massive amount of UDP flows which
then run into the limits if stale flows are not evicted fast enough. One
option to mitigate this for vxlan is to narrow the source port range via
IFLA_VXLAN_PORT_RANGE while still being able to benefit from RSS. However,
geneve currently does not have this option and it spreads traffic across
the full source port range of [1, USHRT_MAX]. To overcome this limitation
also for geneve, add an equivalent IFLA_GENEVE_PORT_RANGE setting for users.

Note that struct geneve_config before/after still remains at 2 cachelines
on x86-64. The low/high members of struct ifla_geneve_port_range (which is
uapi exposed) are of type __be16. While they would be perfectly fine to be
of __u16 type, the consensus was that it would be good to be consistent
with the existing struct ifla_vxlan_port_range from a uapi consumer PoV.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://learn.microsoft.com/en-us/azure/virtual-network/virtual-machine-network-throughput [0]
Link: https://patch.msgid.link/20250226182030.89440-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27 16:54:54 -08:00
David Yat Sin e90711946b drm/amdkfd: clamp queue size to minimum
If queue size is less than minimum, clamp it to minimum to prevent
underflow when writing queue mqd.

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 16:50:04 -05:00
Jakub Kicinski 357660d759 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.14-rc5).

Conflicts:

drivers/net/ethernet/cadence/macb_main.c
  fa52f15c74 ("net: cadence: macb: Synchronize stats calculations")
  75696dd0fd ("net: cadence: macb: Convert to get_stats64")
https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/intel/ice/ice_sriov.c
  79990cf5e7 ("ice: Fix deinitializing VF in error path")
  a203163274 ("ice: simplify VF MSI-X managing")

net/ipv4/tcp.c
  18912c5206 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace")
  297d389e9e ("net: prefix devmem specific helpers")

net/mptcp/subflow.c
  8668860b0a ("mptcp: reset when MPTCP opts are dropped after join")
  c3349a22c2 ("mptcp: consolidate subflow cleanup")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27 10:20:58 -08:00
Jens Axboe c0d8c0362b Merge branch 'io_uring-6.14' into for-6.15/io_uring
Merge mainline fixes into 6.15 branch, as upcoming patches depend on
fixes that went into the 6.14 mainline branch.

* io_uring-6.14:
  io_uring/net: save msg_control for compat
  io_uring/rw: clean up mshot forced sync mode
  io_uring/rw: move ki_complete init into prep
  io_uring/rw: don't directly use ki_complete
  io_uring/rw: forbid multishot async reads
  io_uring/rsrc: remove unused constants
  io_uring: fix spelling error in uapi io_uring.h
  io_uring: prevent opcode speculation
  io-wq: backoff when retrying worker creation
2025-02-27 07:18:01 -07:00
Eric Dumazet 3ba075278c tcp: be less liberal in TSEcr received while in SYN_RECV state
Yong-Hao Zou mentioned that linux was not strict as other OS in 3WHS,
for flows using TCP TS option (RFC 7323)

As hinted by an old comment in tcp_check_req(),
we can check the TSEcr value in the incoming packet corresponds
to one of the SYNACK TSval values we have sent.

In this patch, I record the oldest and most recent values
that SYNACK packets have used.

Send a challenge ACK if we receive a TSEcr outside
of this range, and increase a new SNMP counter.

nstat -az | grep TSEcrRejected
TcpExtTSEcrRejected            0                  0.0

Due to TCP fastopen implementation, do not apply yet these checks
for fastopen flows.

v2: No longer use req->num_timeout, but treq->snt_tsval_first
    to detect when first SYNACK is prepared. This means
    we make sure to not send an initial zero TSval.
    Make sure MPTCP and TCP selftests are passing.
    Change MIB name to TcpExtTSEcrRejected

v1: https://lore.kernel.org/netdev/CADVnQykD8i4ArpSZaPKaoNxLJ2if2ts9m4As+=Jvdkrgx1qMHw@mail.gmail.com/T/

Reported-by: Yong-Hao Zou <yonghaoz1994@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250225171048.3105061-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26 18:11:17 -08:00
Linus Torvalds c0d35086a2 Landlock fix for v6.14-rc5
-----BEGIN PGP SIGNATURE-----
 
 iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZ785FhAcbWljQGRpZ2lr
 b2QubmV0AAoJEOXj0OiMgvbSILQBAMFwpFClzjVeWyLFNd/gaTlPWeedvnag+yZu
 CK9q39jOAP9tj1unQBFpsI7jsTk6ZxPVxb4DymPIirZMb/5FuxUnDQ==
 =QrSM
 -----END PGP SIGNATURE-----

Merge tag 'landlock-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock fixes from Mickaël Salaün:
 "Fixes to TCP socket identification, documentation, and tests"

* tag 'landlock-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  selftests/landlock: Add binaries to .gitignore
  selftests/landlock: Test that MPTCP actions are not restricted
  selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCP
  landlock: Fix non-TCP sockets restriction
  landlock: Minor typo and grammar fixes in IPC scoping documentation
  landlock: Fix grammar error
  selftests/landlock: Enable the new CONFIG_AF_UNIX_OOB
2025-02-26 11:55:44 -08:00
Sebastian Ott 3adaee7830 KVM: arm64: Allow userspace to change the implementation ID registers
KVM's treatment of the ID registers that describe the implementation
(MIDR, REVIDR, and AIDR) is interesting, to say the least. On the
userspace-facing end of it, KVM presents the values of the boot CPU on
all vCPUs and treats them as invariant. On the guest side of things KVM
presents the hardware values of the local CPU, which can change during
CPU migration in a big-little system.

While one may call this fragile, there is at least some degree of
predictability around it. For example, if a VMM wanted to present
big-little to a guest, it could affine vCPUs accordingly to the correct
clusters.

All of this makes a giant mess out of adding support for making these
implementation ID registers writable. Avoid breaking the rather subtle
ABI around the old way of doing things by requiring opt-in from
userspace to make the registers writable.

When the cap is enabled, allow userspace to set MIDR, REVIDR, and AIDR
to any non-reserved value and present those values consistently across
all vCPUs.

Signed-off-by: Sebastian Ott <sebott@redhat.com>
[oliver: changelog, capability]
Link: https://lore.kernel.org/r/20250225005401.679536-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26 01:32:16 -08:00
Gal Pressman ecdff89338 ethtool: Symmetric OR-XOR RSS hash
Add an additional type of symmetric RSS hash type: OR-XOR.
The "Symmetric-OR-XOR" algorithm transforms the input as follows:

(SRC_IP | DST_IP, SRC_IP ^ DST_IP, SRC_PORT | DST_PORT, SRC_PORT ^ DST_PORT)

Change 'cap_rss_sym_xor_supported' to 'supported_input_xfrm', a bitmap
of supported RXH_XFRM_* types.

Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250224174416.499070-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25 18:31:04 -08:00
Jonas Gottlieb 6002850fdf Add OVN to `rtnetlink.h`
- The Open Virtual Network (OVN) routing netlink handler uses ID 84
- Will also add to `/etc/iproute2/rt_protos` once this is accepted
- For more information: https://github.com/ovn-org/ovn

Signed-off-by: Jonas Gottlieb <jonas.gottlieb@stackit.cloud>
Link: https://patch.msgid.link/Z7w_e7cfA3xmHDa6@SIT-SDELAP4051.int.lidl.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25 18:14:03 -08:00
Sven Eckelmann 9443335502 batman-adv: Use consistent name for mesh interface
The way how the virtual interface is called inside the batman-adv source
code is not consistent. The genl headers call it meshif and the rest of the
code calls is (mostly) softif.

The genl definitions cannot be touched because they are part of the UAPI.
But the rest of the batman-adv code can be touched to have a consistent
name again.

The bulk of the renaming was done using

  sed -i -e 's/soft\(-\|\_\| \|\)i\([nf]\)/mesh\1i\2/g' \
         -e 's/SOFT\(-\|\_\| \|\)I\([NF]\)/MESH\1I\2/g'

and then it was adjusted slightly when proofreading the changes.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-22 11:36:22 +01:00
Jeremy Kerr dcc35baae7 usb: Add base USB MCTP definitions
Upcoming changes will add a USB host (and later gadget) driver for the
MCTP-over-USB protocol. Add a header that provides common definitions
for protocol support: the packet header format and a few framing
definitions. Add a define for the MCTP class code, as per
https://usb.org/defined-class-codes.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-1-3353030fe9cc@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:45:21 -08:00
Ido Schimmel ca4edd969a net: fib_rules: Add DSCP mask attribute
Add an attribute that allows matching on DSCP with a mask. Matching on
DSCP with a mask is needed in deployments where users encode path
information into certain bits of the DSCP field.

Temporarily set the type of the attribute to 'NLA_REJECT' while support
is being added.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20250220080525.831924-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 16:08:47 -08:00
Jakub Kicinski e87700965a bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCZ7ffOQAKCRAraS+Z+3y5
 EzVHAP9h/QkeYoOZW9gul08I8vFiZsFe/lbOSLJWxeVfxb9JhgD/cMqby3qAxQK6
 lsdNQ9jYG2232Wym89ag7fvTBK15Wg4=
 =gkN2
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-02-20

We've added 19 non-merge commits during the last 8 day(s) which contain
a total of 35 files changed, 1126 insertions(+), 53 deletions(-).

The main changes are:

1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing

2) Add network TX timestamping support to BPF sock_ops, from Jason Xing

3) Add TX metadata Launch Time support, from Song Yoong Siang

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  igc: Add launch time support to XDP ZC
  igc: Refactor empty frame insertion for launch time support
  net: stmmac: Add launch time support to XDP ZC
  selftests/bpf: Add launch time request to xdp_hw_metadata
  xsk: Add launch time hardware offload support to XDP Tx metadata
  selftests/bpf: Add simple bpf tests in the tx path for timestamping feature
  bpf: Support selective sampling for bpf timestamping
  bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback
  bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback
  net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING
  bpf: Disable unsafe helpers in TX timestamping callbacks
  bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
  bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping
  bpf: Add networking timestamping support to bpf_get/setsockopt()
  selftests/bpf: Add rto max for bpf_setsockopt test
  bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt
====================

Link: https://patch.msgid.link/20250221022104.386462-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21 15:59:47 -08:00
Ilpo Järvinen 7e077e6707 PCI/ERR: Handle TLP Log in Flit mode
Flit mode introduced in PCIe r6.0 alters how the TLP Header Log is
presented through AER and DPC Capability registers. The TLP Prefix Log
Register is not present with Flit mode, and the register becomes an
extension of the TLP Header Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13).

Adapt pcie_read_tlp_log() and struct pcie_tlp_log to read and store the
extended TLP Header Log when the Link is in Flit mode. As the Prefix Log
and Extended TLP Header are not present at the same time, a C union can be
used.

Determining whether the error occurred while the Link was in Flit mode is a
bit complicated. In case of AER, the Advanced Error Capabilities and
Control Register directly tells whether the error was logged in Flit mode
or not (PCIe r6.1 sec 7.8.4.7). The DPC Capability (PCIe r6.1 sec 7.9.14),
unfortunately, does not contain the same information.

Unlike AER, the DPC Capability does not provide a way to discern whether
the error was logged in Flit mode (this is confirmed by PCI WG to be an
oversight in the spec). DPC will bring the Link down immediately following
an error, which makes it impossible to acquire the Flit Mode Status
directly from the Link Status 2 register because Flit Mode Status is only
set in certain Link states (PCIe r6.1 sec 7.5.3.20). As a workaround, use
the flit_mode value stored into the struct pci_bus.

Link: https://lore.kernel.org/r/20250207161836.2755-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-21 17:31:45 -06:00
Linus Torvalds f679ebf6aa io_uring-6.14-20250221
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAme4rVYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpodqD/9hVmNuIJTpfLNdKNGgsgE1VRhtTtKgISOV
 U2c2NDVvx6brqyRL2JGfC3DhZyXvC98qEKXfsJf0lbDPh/pP3oQMLWmPJWsJKVUt
 DpE9RzI89/objCvIUNMLS9svixmBcVSBnF9x1ppGl6Ou4+NSkS72aswMNlP3qLr9
 +UnAxEFWCCn/8sl29GdtUDXSm8l/u6oftxeGhXTaczwEVUSvclyTqrsqoy05z9ib
 Ar5PC9h7crwCgrnMahAUkM603Vi3gZzkDiQre/GK28HN7nx0TbwB9inRsxrO3sv5
 01rwwA67oN7sEp2AM3Svh4MyBb3wyNRzcynZMgml2YM+0RRmstmeT2VBjilCRVCT
 s5pO+KnJ1Q64rYcaqPU5+hqfJ55d8qveBjB8Omu4BrFaf7HuzgLq1bbEBrmyoHKm
 TQvM/USnwycOs6FrdAWS5X+nx9pFIDowd6ZqLckVhWxjnIsi/WeyNoJFP9U887X7
 YdatkNoqY7EApykzUxVKzuSw1sgADE9HvnBSrfBcbyxA3Wpl52K+Y+/NyRjlJamF
 aGP8FpM1ZoOcwyWhi0By8PmB4flSnirYQ7Ysu0Wv12CoNylHv1n1pbbGNZ/LuXGN
 MrrE4jQLk5cVEfSk/+C0G5MreUkwav8K5cnEoqq36peq9LXW36jQdWkYieZMCbun
 lKODmRPAgA==
 =w+aR
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-6.14-20250221' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Series fixing an issue with multishot read on pollable files that may
   return -EIOCBQUEUED from ->read_iter(). Four small patches for that,
   the first one deliberately done in such a way that it'd be easy to
   backport

 - Remove some dead constant definitions

 - Use array_index_nospec() for opcode indexing

 - Work-around for worker creation retries in the presence of signals

* tag 'io_uring-6.14-20250221' of git://git.kernel.dk/linux:
  io_uring/rw: clean up mshot forced sync mode
  io_uring/rw: move ki_complete init into prep
  io_uring/rw: don't directly use ki_complete
  io_uring/rw: forbid multishot async reads
  io_uring/rsrc: remove unused constants
  io_uring: fix spelling error in uapi io_uring.h
  io_uring: prevent opcode speculation
  io-wq: backoff when retrying worker creation
2025-02-21 09:17:56 -08:00
Kannappan R c749f058b4 USB: core: Add eUSB2 descriptor and parsing in USB core
Add support for the 'eUSB2 Isochronous Endpoint Companion Descriptor'
introduced in the recent USB 2.0 specification 'USB 2.0 Double Isochronous
IN Bandwidth' ECN.

It allows embedded USB2 (eUSB2) devices to report and use higher bandwidths
for isochronous IN transfers in order to support higher camera resolutions
on the lid of laptops and tablets with minimal change to the USB2 protocol.

The motivation for expanding USB 2.0 is further clarified in an additional
Embedded USB2 version 2.0 (eUSB2v2) supplement to the USB 2.0
specification. It points out this is optimized for performance, power and
cost by using the USB 2.0 low-voltage, power efficient PHY and half-duplex
link for the asymmetric camera bandwidth needs, avoiding the costly and
complex full-duplex USB 3.x symmetric link and gigabit receivers.

eUSB2 devices that support the higher isochronous IN bandwidth and the new
descriptor can be identified by their device descriptor bcdUSB value of
0x0220

Co-developed-by: Amardeep Rai <amardeep.rai@intel.com>
Signed-off-by: Amardeep Rai <amardeep.rai@intel.com>
Signed-off-by: Kannappan R <r.kannappan@intel.com>
Co-developed-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20250220141339.1939448-1-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21 10:45:32 +01:00
Niklas Söderlund 7b0ee2de7c media: uapi: rkisp1-config: Fix typo in extensible params example
The define used for the version in the example diagram does not match what
is defined in enum rksip1_ext_param_buffer_version, nor the description
above it. Correct the typo to make it clear which define to use.

Fixes: e9d05e9d5d ("media: uapi: rkisp1-config: Add extensible params format")
Cc: stable@vger.kernel.org
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-02-21 10:33:01 +01:00
Alexei Starovoitov bd4319b6c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf bpf-6.14-rc4
Cross-merge bpf fixes after downstream PR (bpf-6.14-rc4).

Minor conflict:
  kernel/bpf/btf.c
Adjacent changes:
  kernel/bpf/arena.c
  kernel/bpf/btf.c
  kernel/bpf/syscall.c
  kernel/bpf/verifier.c
  mm/memory.c

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-20 18:13:57 -08:00
Song Yoong Siang ca4419f15a xsk: Add launch time hardware offload support to XDP Tx metadata
Extend the XDP Tx metadata framework so that user can requests launch time
hardware offload, where the Ethernet device will schedule the packet for
transmission at a pre-determined time called launch time. The value of
launch time is communicated from user space to Ethernet driver via
launch_time field of struct xsk_tx_metadata.

Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250216093430.957880-2-yoong.siang.song@intel.com
2025-02-20 15:13:45 -08:00
Jason Xing c9525d240c bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback
This patch introduces a new callback in tcp_tx_timestamp() to correlate
tcp_sendmsg timestamp with timestamps from other tx timestamping
callbacks (e.g., SND/SW/ACK).

Without this patch, BPF program wouldn't know which timestamps belong
to which flow because of no socket lock protection. This new callback
is inserted in tcp_tx_timestamp() to address this issue because
tcp_tx_timestamp() still owns the same socket lock with
tcp_sendmsg_locked() in the meanwhile tcp_tx_timestamp() initializes
the timestamping related fields for the skb, especially tskey. The
tskey is the bridge to do the correlation.

For TCP, BPF program hooks the beginning of tcp_sendmsg_locked() and
then stores the sendmsg timestamp at the bpf_sk_storage, correlating
this timestamp with its tskey that are later used in other sending
timestamping callbacks.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-11-kerneljasonxing@gmail.com
2025-02-20 14:29:48 -08:00
Jason Xing b3b81e6b00 bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback
Support the ACK case for bpf timestamping.

Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_ACK_CB. This
callback will occur at the same timestamping point as the user
space's SCM_TSTAMP_ACK. The BPF program can use it to get the
same SCM_TSTAMP_ACK timestamp without modifying the user-space
application.

This patch extends txstamp_ack to two bits: 1 stands for
SO_TIMESTAMPING mode, 2 bpf extension.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-10-kerneljasonxing@gmail.com
2025-02-20 14:29:43 -08:00
Jason Xing 2deaf7f42b bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback
Support hw SCM_TSTAMP_SND case for bpf timestamping.

Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_HW_CB. This
callback will occur at the same timestamping point as the user
space's hardware SCM_TSTAMP_SND. The BPF program can use it to
get the same SCM_TSTAMP_SND timestamp without modifying the
user-space application.

To avoid increasing the code complexity, replace SKBTX_HW_TSTAMP
with SKBTX_HW_TSTAMP_NOBPF instead of changing numerous callers
from driver side using SKBTX_HW_TSTAMP. The new definition of
SKBTX_HW_TSTAMP means the combination tests of socket timestamping
and bpf timestamping. After this patch, drivers can work under the
bpf timestamping.

Considering some drivers don't assign the skb with hardware
timestamp, this patch does the assignment and then BPF program
can acquire the hwstamp from skb directly.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-9-kerneljasonxing@gmail.com
2025-02-20 14:29:36 -08:00
Jason Xing ecebb17ad8 bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback
Support sw SCM_TSTAMP_SND case for bpf timestamping.

Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_SW_CB. This
callback will occur at the same timestamping point as the user
space's software SCM_TSTAMP_SND. The BPF program can use it to
get the same SCM_TSTAMP_SND timestamp without modifying the
user-space application.

Based on this patch, BPF program will get the software
timestamp when the driver is ready to send the skb. In the
sebsequent patch, the hardware timestamp will be supported.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-8-kerneljasonxing@gmail.com
2025-02-20 14:29:30 -08:00
Jason Xing 6b98ec7e88 bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback
Support SCM_TSTAMP_SCHED case for bpf timestamping.

Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SCHED_CB. This
callback will occur at the same timestamping point as the user
space's SCM_TSTAMP_SCHED. The BPF program can use it to get the
same SCM_TSTAMP_SCHED timestamp without modifying the user-space
application.

A new SKBTX_BPF flag is added to mark skb_shinfo(skb)->tx_flags,
ensuring that the new BPF timestamping and the current user
space's SO_TIMESTAMPING do not interfere with each other.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-7-kerneljasonxing@gmail.com
2025-02-20 14:29:24 -08:00
Jason Xing 24e82b7c04 bpf: Add networking timestamping support to bpf_get/setsockopt()
The new SK_BPF_CB_FLAGS and new SK_BPF_CB_TX_TIMESTAMPING are
added to bpf_get/setsockopt. The later patches will implement the
BPF networking timestamping. The BPF program will use
bpf_setsockopt(SK_BPF_CB_FLAGS, SK_BPF_CB_TX_TIMESTAMPING) to
enable the BPF networking timestamping on a socket.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-2-kerneljasonxing@gmail.com
2025-02-20 14:28:37 -08:00
Jakub Kicinski 5d6ba5ab85 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.14-rc4).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20 10:37:30 -08:00
Jens Axboe 19f7e94273 io_uring/epoll: add support for IORING_OP_EPOLL_WAIT
For existing epoll event loops that can't fully convert to io_uring,
the used approach is usually to add the io_uring fd to the epoll
instance and use epoll_wait() to wait on both "legacy" and io_uring
events. While this work, it isn't optimal as:

1) epoll_wait() is pretty limited in what it can do. It does not support
   partial reaping of events, or waiting on a batch of events.

2) When an io_uring ring is added to an epoll instance, it activates the
   io_uring "I'm being polled" logic which slows things down.

Rather than use this approach, with EPOLL_WAIT support added to io_uring,
event loops can use the normal io_uring wait logic for everything, as
long as an epoll wait request has been armed with io_uring.

Note that IORING_OP_EPOLL_WAIT does NOT take a timeout value, as this
is an async request. Waiting on io_uring events in general has various
timeout parameters, and those are the ones that should be used when
waiting on any kind of request. If events are immediately available for
reaping, then This opcode will return those immediately. If none are
available, then it will post an async completion when they become
available.

cqe->res will contain either an error code (< 0 value) for a malformed
request, invalid epoll instance, etc. It will return a positive result
indicating how many events were reaped.

IORING_OP_EPOLL_WAIT requests may be canceled using the normal io_uring
cancelation infrastructure.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-20 07:59:56 -07:00
Jens Axboe 9269919478 Merge branch 'for-6.15/io_uring-rx-zc' into for-6.15/io_uring-epoll-wait
* for-6.15/io_uring-rx-zc: (77 commits)
  io_uring: Rename KConfig to Kconfig
  io_uring/zcrx: fix leaks on failed registration
  io_uring/zcrx: recheck ifq on shutdown
  io_uring/zcrx: add selftest
  net: add documentation for io_uring zcrx
  io_uring/zcrx: add copy fallback
  io_uring/zcrx: throttle receive requests
  io_uring/zcrx: set pp memory provider for an rx queue
  io_uring/zcrx: add io_recvzc request
  io_uring/zcrx: dma-map area for the device
  io_uring/zcrx: implement zerocopy receive pp memory provider
  io_uring/zcrx: grab a net device
  io_uring/zcrx: add io_zcrx_area
  io_uring/zcrx: add interface queue and refill queue
  net: add helpers for setting a memory provider on an rx queue
  net: page_pool: add memory provider helpers
  net: prepare for non devmem TCP memory providers
  net: page_pool: add a mp hook to unregister_netdevice*
  net: page_pool: add callback for mp info printing
  netdev: add io_uring memory provider info
  ...
2025-02-20 07:59:30 -07:00
Paolo Abeni 384cba25b8 linux-can-next-for-6.15-20250219
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAme1voYTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAMdGXf+ZCRnFOCB/0dTrOBuRiX1Gg6yIkYs3lzrWlmCv98
 itowE3y2RfAxlFZXaTSNTCFKrKpyk56C13tcq2okaqCWrBzcDkNeVemsNKJ49OBK
 pYdxXsiJtddaVg6qJ2up2f/4btgbD1tEH0vzDm2S4z5YigjDFuAiQslbNi6rFSSt
 Lcp4yhPcmQ+L+3nGYP0HmYPIX0K1gWL3jFEFPr4e/XWiCKn9bYOUdYRUV6xVa1Dz
 lHkA8rqLJ+Gdwc6P3FgSdrTUafCFfgeFlPNnl1ULiLchHK2ZklPWZjFBb/jhKRJM
 /Ef9RgpMXrSb8P6lS25RWptTSFnKrg00VqVWV0Fi9rxELe3OvSdoAUuK
 =URof
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-6.15-20250219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2025-02-19

this is a pull request of 12 patches for net-next/master.

The first 4 patches are by Krzysztof Kozlowski and simplify the c_can
driver's c_can_plat_probe() function.

Ciprian Marian Costea contributes 3 patches to add S32G2/S32G3 support
to the flexcan driver.

Ruffalo Lavoisier's patch removes a duplicated word from the mcp251xfd
DT bindings documentation.

Oleksij Rempel extends the J1939 documentation.

The next patch is by Oliver Hartkopp and adds access for the Remote
Request Substitution bit in CAN-XL frames.

Henrik Brix Andersen's patch for the gs_usb driver adds support for
the CANnectivity firmware.

The last patch is by Robin van der Gracht and removes a duplicated
setup of RX FIFO in the rockchip_canfd driver.

linux-can-next-for-6.15-20250219

* tag 'linux-can-next-for-6.15-20250219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: rockchip_canfd: rkcanfd_chip_fifo_setup(): remove duplicated setup of RX FIFO
  can: gs_usb: add VID/PID for the CANnectivity firmware
  can: canxl: support Remote Request Substitution bit access
  can: j1939: Extend stack documentation with buffer size behavior
  dt-binding: can: mcp251xfd: remove duplicate word
  can: flexcan: add NXP S32G2/S32G3 SoC support
  can: flexcan: Add quirk to handle separate interrupt lines for mailboxes
  dt-bindings: can: fsl,flexcan: add S32G2/S32G3 SoC support
  can: c_can: Use syscon_regmap_lookup_by_phandle_args
  can: c_can: Use of_property_present() to test existence of DT property
  can: c_can: Simplify handling syscon error path
  can: c_can: Drop useless final probe failure message
====================

Link: https://patch.msgid.link/20250219113354.529611-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20 10:18:37 +01:00
Ido Schimmel 39f970aead net: fib_rules: Add port mask attributes
Add attributes that allow matching on source and destination ports with
a mask. Matching on the source port with a mask is needed in deployments
where users encode path information into certain bits of the UDP source
port.

Temporarily set the type of the attributes to 'NLA_REJECT' while support
is being added.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19 18:43:37 -08:00
Linus Torvalds 87a132e739 18 hotfixes. 5 are cc:stable and the remainder address post-6.13 issues
or aren't considered necessary for -stable kernels.
 
 10 are for MM and 8 are for non-MM.  All are singletons, please see the
 changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ7aKTwAKCRDdBJ7gKXxA
 jo9eAQD0GBh7LaeobM+OJBN0E+u/wKySR/QpGfQX1h/uTpcOPAEA+Q5yaNcmFIzO
 NB/htGoMpW2F9gru3pwAT7CgnE3qeg8=
 =Y0sw
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-02-19-17-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "18 hotfixes. 5 are cc:stable and the remainder address post-6.13
  issues or aren't considered necessary for -stable kernels.

  10 are for MM and 8 are for non-MM. All are singletons, please see the
  changelogs for details"

* tag 'mm-hotfixes-stable-2025-02-19-17-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  test_xarray: fix failure in check_pause when CONFIG_XARRAY_MULTI is not defined
  kasan: don't call find_vm_area() in a PREEMPT_RT kernel
  MAINTAINERS: update Nick's contact info
  selftests/mm: fix check for running THP tests
  mm: hugetlb: avoid fallback for specific node allocation of 1G pages
  memcg: avoid dead loop when setting memory.max
  mailmap: update Nick's entry
  mm: pgtable: fix incorrect reclaim of non-empty PTE pages
  taskstats: modify taskstats version
  getdelays: fix error format characters
  mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize()
  tools/mm: fix build warnings with musl-libc
  mailmap: add entry for Feng Tang
  .mailmap: add entries for Jeff Johnson
  mm,madvise,hugetlb: check for 0-length range after end address adjustment
  mm/zswap: fix inconsistency when zswap_store_page() fails
  lib/iov_iter: fix import_iovec_ubuf iovec management
  procfs: fix a locking bug in a vmcore_add_device_dump() error path
2025-02-19 18:11:28 -08:00
Oliver Hartkopp e1b2c7e902 can: canxl: support Remote Request Substitution bit access
The Remote Request Substitution bit is a dominant bit ("0") in the CAN
XL frame. As some CAN XL controllers support to access this bit a new
CANXL_RRS value has been defined for the canxl_frame.flags element.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20250124142347.7444-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-02-19 11:10:54 +01:00
Jens Axboe 1fc61eeefe io_uring: fix spelling error in uapi io_uring.h
This is obviously not that important, but when changes are synced back
from the kernel to liburing, the codespell CI ends up erroring because
of this misspelling. Let's just correct it and avoid this biting us
again on an import.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-18 16:47:40 -07:00
Rorie Reyes 07d89045bf s390/vfio-ap: Signal eventfd when guest AP configuration is changed
In this patch, an eventfd object is created by the vfio_ap device driver
and used to notify userspace when a guests's AP configuration is
dynamically changed. Such changes may occur whenever:

* An adapter, domain or control domain is assigned to or unassigned from a
  mediated device that is attached to the guest.
* A queue assigned to the mediated device that is attached to a guest is
  bound to or unbound from the vfio_ap device driver. This can occur
  either by manually binding/unbinding the queue via the vfio_ap driver's
  sysfs bind/unbind attribute interfaces, or because an adapter, domain or
  control domain assigned to the mediated device is added to or removed
  from the host's AP configuration via an SE/HMC

The purpose of this patch is to provide immediate notification of changes
made to a guest's AP configuration by the vfio_ap driver. This will enable
the guest to take immediate action rather than relying on polling or some
other inefficient mechanism to detect changes to its AP configuration.

Note that there are corresponding QEMU patches that will be shipped along
with this patch (see vfio-ap: Report vfio-ap configuration changes) that
will pick up the eventfd signal.

Signed-off-by: Rorie Reyes <rreyes@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Tested-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20250107183645.90082-1-rreyes@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-02-18 18:53:48 +01:00
Linus Torvalds 6537cfb395 sound fixes for 6.14-rc4
A slightly large collection of fixes, spread over various drivers.
 Almost all are small and device-specific fixes and quirks in ASoC
 SOF Intel and AMD, Renesas, Cirrus, HD-audio, in addition to a small
 fix for MIDI 2.0.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAme0bwIOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE+6SQ//ZUPAzra8D1zIR9Gfk/ImULAPOpK2R9MK6Tg9
 u6LIbFEUVZ1/A5MbsYWbW6bdndu0q/pWYHmOSNbpbWPpnxANu/SGTqex6KjkEUWw
 Co5P/Mv9ZQWflI1tXGNtneS1cAnJ888qKrkNF2JnzENa0TeuiVOUuCNUI0Ro/SQ+
 nxGtd1239zgzXYyruSe+Gj8haetBR3RfFKD042cGYY9TNNsnVFd0Tlsi5OLZYSk1
 E0w4mJ92E3kpL9slFDVC9xa+pPy4e39y+2xpXqwN15kh86J5Gvp2w+Z8fz0W7EaV
 6yZwx2xlM+DDO71gXoX4XwMLn93b4gQbFMjPdssDhVP9dH8aA04lRFDmvnRpy2eq
 m8c7TLw8M/QNtz784GvmmgZavv4/twbL0x7HvZ1VvSenbFX9UxozPaYHnfh+Ev4T
 vwjq8S7DX35gKKu7rEk/euGDnQ2wHEfD+f7inCKf1fB1o0oj3q0yxD6Olld3/M3L
 8ALiDWQ8Yx42kpYpEpeQvStR0Ocb1BRQ5DrAhG2/v/z78sAZMorXF58cpzLphmmS
 wSDlZm4rNe23AZ5GUh6BwrdUpQjttQEyakMZ97xm1T/KDZ4niY1i4f5SLi6MR0ak
 ErkSSD0idSKdnxuZwuJCh18NjAgZE5ve5qgOPOYnUPr47LbXZuq/nCQAZ2ibFzV5
 QlHZDQQ=
 =F+UA
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A slightly large collection of fixes, spread over various drivers.

  Almost all are small and device-specific fixes and quirks in ASoC SOF
  Intel and AMD, Renesas, Cirrus, HD-audio, in addition to a small fix
  for MIDI 2.0"

* tag 'sound-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (41 commits)
  ALSA: seq: Drop UMP events when no UMP-conversion is set
  ALSA: hda/conexant: Add quirk for HP ProBook 450 G4 mute LED
  ALSA: hda/cirrus: Reduce codec resume time
  ALSA: hda/cirrus: Correct the full scale volume set logic
  virtio_snd.h: clarify that `controls` depends on VIRTIO_SND_F_CTLS
  ALSA: hda: Add error check for snd_ctl_rename_id() in snd_hda_create_dig_out_ctls()
  ALSA: hda/tas2781: Fix index issue in tas2781 hda SPI driver
  ASoC: imx-audmix: remove cpu_mclk which is from cpu dai device
  ALSA: hda/realtek: Fixup ALC225 depop procedure
  ALSA: hda/tas2781: Update tas2781 hda SPI driver
  ASoC: cs35l41: Fix acpi_device_hid() not found
  ASoC: SOF: amd: Add branch prediction hint in ACP IRQ handler
  ASoC: SOF: amd: Handle IPC replies before FW_BOOT_COMPLETE
  ASoC: SOF: amd: Drop unused includes from Vangogh driver
  ASoC: SOF: amd: Add post_fw_run_delay ACP quirk
  ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt713_vb_l2_rt1320_l13
  ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt712_vb + rt1320 support
  ALSA: Switch to use hrtimer_setup()
  ALSA: hda: hda-intel: add Panther Lake-H support
  ASoC: SOF: Intel: pci-ptl: Add support for PTL-H
  ...
2025-02-18 09:00:31 -08:00
Wang Yaxin b016d08737 taskstats: modify taskstats version
After adding "delay max" and "delay min" to the taskstats structure, the
taskstats version needs to be updated.

Link: https://lkml.kernel.org/r/20250208144901218Q5ptVpqsQkb2MOEmW4Ujn@zte.com.cn
Fixes: f65c64f311 ("delayacct: add delay min to record delay peak")
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17 22:40:02 -08:00
Joe Damato df524c8f57 netdev-genl: Add an XSK attribute to queues
Expose a new per-queue nest attribute, xsk, which will be present for
queues that are being used for AF_XDP. If the queue is not being used for
AF_XDP, the nest will not be present.

In the future, this attribute can be extended to include more data about
XSK as it is needed.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250214211255.14194-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17 16:46:03 -08:00