Commit Graph

12244 Commits (15e9e00a5aa4f56ca1cff7749c166e072d7cb6ac)

Author SHA1 Message Date
Linus Torvalds c92b4d3dd5 for-7.1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmnYSG0ACgkQxWXV+ddt
 WDumDQ/9E8ms1vZcfMwZUf48o7Z2fHnZMUy6dXKHnH72NiRrqSP2jZnhluT6qGqb
 MmmnqvmKFNfJ0J5QLZTgFz/MWzY7PQEIG8WkQ3JvT6iKO5Csa2vFzCXv1oaGWo+m
 TIw++3IS+GliKYQedgVXMYRKFc24OP95RO+Grsh8pMOXWcpSO60oSrTPyzbkdfid
 +Gv4CpSRTCCl/qQ8ZX2PRQ9tLJtR2IAnJBWkwE/MPWxFfkt0oBiauy/BoiddGwrl
 ocDn5fH2CnORwONLGPbVg0ScVNMaRFJfYVrI18N8pfT+4ZVeJFiWGiRnrqSmk8PG
 a8BT51VPZZunyGoVFZmpqOhsy8PtqpjX0ljpebY7K69fH+1ewrWVE9ovs/nZ6Hq+
 DgB9pXu2OxKdyByHfr8Pl/0A2naWOrQ0JHOGnVsEg2qDi67vy5EBUIYQbiS9uo4s
 IFdd5bA04DS0Khzp2Y8Crrc2tWootsRCcUs6oiwKgKVBoqtNbFvVHKJqfi8XZB6i
 W4/rL+F0gBVzR127TZF+tejd1jq9u6WOBRKlwkHK5DoWXiv84oLv/zdwtqinTWLs
 N7LOFfDgYwH1YNPx12tEm9DW3Ef76RlHPZiTAmG4NUphmgwkKaYOosqsX7WvrMqR
 kkeKfbsRm4M/lQDLwd8IBUloMhl2+uspxJrkNUy/31pxWxByGvk=
 =mVDJ
 -----END PGP SIGNATURE-----

Merge tag 'for-7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "User visible changes:

   - move shutdown ioctl support out of experimental features, a forced
     stop of filesystem operation until the next unmount; additionally
     there's a super block operation to forcibly remove a device from
     under the filesystem that could lead to a shutdown or not if the
     redundancy allows that

   - report filesystem shutdown using fserror mechanism

   - tree-checker updates:
      - verify free space info, extent and bitmap items
      - verify remap-tree items and related data in block group items

  Performance improvements:

   - speed up clearing first extent in the tracked range (+10%
     throughput on sample workload)

   - reduce COW rewrites of extent buffers during the same transaction

   - avoid taking big device lock to update device stats during
     transaction commit

   - fix unnecessary flush on close when truncating empty files
     (observed in practice on a backup application)

   - prevent direct reclaim during compressed readahead to avoid stalls
     under memory pressure

  Notable fixes:

   - fix chunk allocation strategy on RAID1-like block groups with
     disproportionate device sizes, this could lead to ENOSPC due to
     skewed reservation estimates

   - adjust metadata reservation overcommit ratio to be less aggressive
     and also try to flush if possible, this avoids ENOSPC and potential
     transaction aborts in some edge cases (that are otherwise hard to
     reproduce)

   - fix silent IO error in encoded writes and ordered extent split in
     zoned mode, the error was not correctly propagated to the address
     space and could lead to zeroed ranges

   - don't mark inline files NOCOMPRESS unexpectedly, the intent was to
     do that for single block writes of regular files

   - fix deadlock between reflink and transaction commit when using
     flushoncommit

   - fix overly strict item check of a running dev-replace operation

  Core:

   - zoned mode space reservation fixes:
      - cap delayed refs metadata reservation to avoid overcommit
      - update logic to reclaim partially unusable zones
      - add another state to flush and reclaim partially used zone
      - limit number of zones reclaimed in one go to avoid blocking
        other operations

   - don't let log trees consume global reserve on overcommit and fall
     back to transaction commit

   - revalidate extent buffer when checking its up-to-date status

   - add self tests for zoned mode block group specifics

   - reduce atomic allocations in some qgroup paths

   - avoid unnecessary root node COW during snapshotting

   - start new transaction in block group relocation conditionally

   - faster check of NOCOW files on currently snapshotted root

   - change how compressed bio size is tracked from bio and reduce the
     structure size

   - new tracepoint for search slot restart tracking

   - checksum list manipulation improvements

   - type, parameter cleanups, refactoring

   - error handling improvements, transaction abort call adjustments"

* tag 'for-7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (116 commits)
  btrfs: btrfs_log_dev_io_error() on all bio errors
  btrfs: fix silent IO error loss in encoded writes and zoned split
  btrfs: skip clearing EXTENT_DEFRAG for NOCOW ordered extents
  btrfs: use BTRFS_FS_UPDATE_UUID_TREE_GEN flag for UUID tree rescan check
  btrfs: remove duplicate journal_info reset on failure to commit transaction
  btrfs: tag as unlikely if statements that check for fs in error state
  btrfs: fix double free in create_space_info() error path
  btrfs: fix double free in create_space_info_sub_group() error path
  btrfs: do not reject a valid running dev-replace
  btrfs: only invalidate btree inode pages after all ebs are released
  btrfs: prevent direct reclaim during compressed readahead
  btrfs: replace BUG_ON() with error return in cache_save_setup()
  btrfs: zstd: don't cache sectorsize in a local variable
  btrfs: zlib: don't cache sectorsize in a local variable
  btrfs: zlib: drop redundant folio address variable
  btrfs: lzo: inline read/write length helpers
  btrfs: use common eb range validation in read_extent_buffer_to_user_nofault()
  btrfs: read eb folio index right before loops
  btrfs: rename local variable for offset in folio
  btrfs: unify types for binary search variables
  ...
2026-04-13 16:35:32 -07:00
Linus Torvalds 23acda7c22 for-7.1/io_uring-20260411
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmna0vIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpu8MEACN6owH/1suaJp5HBhrKseVIPQl1ldmsGF3
 ZDwZndUE6pWXaeuI3g5QjSPcfWIUuLG6vs/btkIh4M32zAcFsSD8zYPItvgFzMVp
 X762WPCrUcfFwKt5GqeNn6IblO8BrsbzoJWNCaSVRhWqCdzQRVktq6684nNy/fj1
 JBFnMsRpwGhoKzpg1oCLOrs0V57CRdJqFdmMzQHwRTWHemvfHf6SD2+h9axfKCaV
 baqvXGOLQXLwr8qHFo1LIu8lqEltHUa7boU8EMFQn/v8sPjUv46EuqZ8VVtzXH08
 fY2zqWI5atA3DZCfORCHnK0qh6tPiSUtVUilXbIffhqd6lCTs891RJf3TegRCGTZ
 k8WfBFVKzVlhbgGk0Km6+tiHTaK1ZmcKU0Q+uucnb3RlOdOoPvXJy3u+I5BK74aV
 36JmNPWRQfzh5icmrrGKySBTX0z7NPtMiEA+qHEndIO5FWrkf5pf9U5C5gu0WEMh
 iK2gotbd0Vym3EpqKQnefxflce6IpYteOACeYPXAprcQOzPK+WYjiVUJ9JcH6DhP
 RPUIXXck8+GkHnM9vWtBXBKaoR7gcATHUzLX8ZnhDkAhsTJ+tOXN8skq28gglUtj
 8kLMzyXklbhAJsykxKn0rqcNUOcVMatFyK4VIFyp2tWRhzMDAY4xyXYSz0lRowkd
 pZAm4eSkmw==
 =IoaB
 -----END PGP SIGNATURE-----

Merge tag 'for-7.1/io_uring-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring updates from Jens Axboe:

 - Add a callback driven main loop for io_uring, and BPF struct_ops
   on top to allow implementing custom event loop logic

 - Decouple IOPOLL from being a ring-wide all-or-nothing setting,
   allowing IOPOLL use cases to also issue certain white listed
   non-polled opcodes

 - Timeout improvements. Migrate internal timeout storage from
   timespec64 to ktime_t for simpler arithmetic and avoid copying of
   timespec data

 - Zero-copy receive (zcrx) updates:

      - Add a device-less mode (ZCRX_REG_NODEV) for testing and
        experimentation where data flows through the copy fallback path

      - Fix two-step unregistration regression, DMA length calculations,
        xarray mark usage, and a potential 32-bit overflow in id
        shifting

      - Refactoring toward multi-area support: dedicated refill queue
        struct, consolidated DMA syncing, netmem array refilling format,
        and guard-based locking

 - Zero-copy transmit (zctx) cleanup:

      - Unify io_send_zc() and io_sendmsg_zc() into a single function

      - Add vectorized registered buffer send for IORING_OP_SEND_ZC

      - Add separate notification user_data via sqe->addr3 so
        notification and completion CQEs can be distinguished without
        extra reference counting

 - Switch struct io_ring_ctx internal bitfields to explicit flag bits
   with atomic-safe accessors, and annotate the known harmless races on
   those flags

 - Various optimizations caching ctx and other request fields in local
   variables to avoid repeated loads, and cleanups for tctx setup, ring
   fd registration, and read path early returns

* tag 'for-7.1/io_uring-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (58 commits)
  io_uring: unify getting ctx from passed in file descriptor
  io_uring/register: don't get a reference to the registered ring fd
  io_uring/tctx: clean up __io_uring_add_tctx_node() error handling
  io_uring/tctx: have io_uring_alloc_task_context() return tctx
  io_uring/timeout: use 'ctx' consistently
  io_uring/rw: clean up __io_read() obsolete comment and early returns
  io_uring/zcrx: use correct mmap off constants
  io_uring/zcrx: use dma_len for chunk size calculation
  io_uring/zcrx: don't clear not allocated niovs
  io_uring/zcrx: don't use mark0 for allocating xarray
  io_uring: cast id to u64 before shifting in io_allocate_rbuf_ring()
  io_uring/zcrx: reject REG_NODEV with large rx_buf_size
  io_uring/cancel: validate opcode for IORING_ASYNC_CANCEL_OP
  io_uring/rsrc: use io_cache_free() to free node
  io_uring/zcrx: rename zcrx [un]register functions
  io_uring/zcrx: check ctrl op payload struct sizes
  io_uring/zcrx: cache fallback availability in zcrx ctx
  io_uring/zcrx: warn on a repeated area append
  io_uring/zcrx: consolidate dma syncing
  io_uring/zcrx: netmem array as refiling format
  ...
2026-04-13 16:22:30 -07:00
Linus Torvalds 7fe6ac157b for-7.1/block-20260411
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmna0tgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptEbD/0ZMEsz5pcN+/bpM9Qva5lVVkByRieua+JA
 T7L+JMcEigp1Hf2idAPlv1e9dbrtgOGhkjZNlbZenP2MHXBmbUTnzTWDKW5w0ZQ4
 UqnVC7fMmxzI57DPt7iG/1WQo8O6QPHWwBof5ZXn0b83qwByTB2oVkAb9ysT7CdM
 wGk5KnPRLIAWf5o+aZ4LoWE+196jQiszx1m6U58FTqnCgvJ/GyKyrgzx+uvGUgF+
 owZT/6TrN7cN9A68fOnmcjEZ7beZXygOQPTn32sF9rEOi8JsgK71EE2LofdVVSNU
 ES/tyKVJbSNDgUH2b0T84rErT4MtZcw5J29V3k7CVndC+DcT2uLSroPz3lYQjDg9
 TLeq7ZLjnyoBG+muboWdXcvBKn3aKLec3nfVSbz6J1xb/Z22gWYy5TZbrGnGH8fJ
 zBiyKkHMaZi55IdTDWQT3a48h36qFh0Y2wbvZ6uhyYOfXHyj4pA4ccJZgFfmf4ZG
 flVRFGEL9Tqc82lB8dfy9DBp0ZQSjeBUCd+gyDKjiuWVau5L5iTUeMMkt8yr7qbg
 PY+ATJcHk5S5zwM2xcZUt5EcHBBbCaKQ6DdRZKwzMMUvCjHlvnWvENVjUtRa9Dng
 1vUKpB/e5NGpqD05Iqgyai+OD9/tALc4sUEI2yQ7/dk9pKIXQ4RE9HR/pSkgbjeR
 LGokj08cgg==
 =ga3t
 -----END PGP SIGNATURE-----

Merge tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block updates from Jens Axboe:

 - Add shared memory zero-copy I/O support for ublk, bypassing per-I/O
   copies between kernel and userspace by matching registered buffer
   PFNs at I/O time. Includes selftests.

 - Refactor bio integrity to support filesystem initiated integrity
   operations and arbitrary buffer alignment.

 - Clean up bio allocation, splitting bio_alloc_bioset() into clear fast
   and slow paths. Add bio_await() and bio_submit_or_kill() helpers,
   unify synchronous bi_end_io callbacks.

 - Fix zone write plug refcount handling and plug removal races. Add
   support for serializing zone writes at QD=1 for rotational zoned
   devices, yielding significant throughput improvements.

 - Add SED-OPAL ioctls for Single User Mode management and a STACK_RESET
   command.

 - Add io_uring passthrough (uring_cmd) support to the BSG layer.

 - Replace pp_buf in partition scanning with struct seq_buf.

 - zloop improvements and cleanups.

 - drbd genl cleanup, switching to pre_doit/post_doit.

 - NVMe pull request via Keith:
      - Fabrics authentication updates
      - Enhanced block queue limits support
      - Workqueue usage updates
      - A new write zeroes device quirk
      - Tagset cleanup fix for loop device

 - MD pull requests via Yu Kuai:
      - Fix raid5 soft lockup in retry_aligned_read()
      - Fix raid10 deadlock with check operation and nowait requests
      - Fix raid1 overlapping writes on writemostly disks
      - Fix sysfs deadlock on array_state=clear
      - Proactive RAID-5 parity building with llbitmap, with
        write_zeroes_unmap optimization for initial sync
      - Fix llbitmap barrier ordering, rdev skipping, and bitmap_ops
        version mismatch fallback
      - Fix bcache use-after-free and uninitialized closure
      - Validate raid5 journal metadata payload size
      - Various cleanups

 - Various other fixes, improvements, and cleanups

* tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (146 commits)
  ublk: fix tautological comparison warning in ublk_ctrl_reg_buf
  scsi: bsg: fix buffer overflow in scsi_bsg_uring_cmd()
  block: refactor blkdev_zone_mgmt_ioctl
  MAINTAINERS: update ublk driver maintainer email
  Documentation: ublk: address review comments for SHMEM_ZC docs
  ublk: allow buffer registration before device is started
  ublk: replace xarray with IDA for shmem buffer index allocation
  ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
  ublk: verify all pages in multi-page bvec fall within registered range
  ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
  xfs: use bio_await in xfs_zone_gc_reset_sync
  block: add a bio_submit_or_kill helper
  block: factor out a bio_await helper
  block: unify the synchronous bi_end_io callbacks
  xfs: fix number of GC bvecs
  selftests/ublk: add read-only buffer registration test
  selftests/ublk: add filesystem fio verify test for shmem_zc
  selftests/ublk: add hugetlbfs shmem_zc test for loop target
  selftests/ublk: add shared memory zero-copy test
  selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target
  ...
2026-04-13 15:51:31 -07:00
Linus Torvalds b8f82cb0d8 Landlock update for v7.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCadfgCxAcbWljQGRpZ2lr
 b2QubmV0AAoJEOXj0OiMgvbSl5cA/0QZjJ+4V2DVzJQM5qzmNK9He9uYaOs7F2Ks
 xRvg7IebAPwMEcVY+CVQxD+YGj08UgM753yx4CRbhsu4k5mowEEJDQ==
 =Lz9R
 -----END PGP SIGNATURE-----

Merge tag 'landlock-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull Landlock update from Mickaël Salaün:
 "This adds a new Landlock access right for pathname UNIX domain sockets
  thanks to a new LSM hook, and a few fixes"

* tag 'landlock-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (23 commits)
  landlock: Document fallocate(2) as another truncation corner case
  landlock: Document FS access right for pathname UNIX sockets
  selftests/landlock: Simplify ruleset creation and enforcement in fs_test
  selftests/landlock: Check that coredump sockets stay unrestricted
  selftests/landlock: Audit test for LANDLOCK_ACCESS_FS_RESOLVE_UNIX
  selftests/landlock: Test LANDLOCK_ACCESS_FS_RESOLVE_UNIX
  selftests/landlock: Replace access_fs_16 with ACCESS_ALL in fs_test
  samples/landlock: Add support for named UNIX domain socket restrictions
  landlock: Clarify BUILD_BUG_ON check in scoping logic
  landlock: Control pathname UNIX domain socket resolution by path
  landlock: Use mem_is_zero() in is_layer_masks_allowed()
  lsm: Add LSM hook security_unix_find
  landlock: Fix kernel-doc warning for pointer-to-array parameters
  landlock: Fix formatting in tsync.c
  landlock: Improve kernel-doc "Return:" section consistency
  landlock: Add missing kernel-doc "Return:" sections
  selftests/landlock: Fix format warning for __u64 in net_test
  selftests/landlock: Skip stale records in audit_match_record()
  selftests/landlock: Drain stale audit records on init
  selftests/landlock: Fix socket file descriptor leaks in audit helpers
  ...
2026-04-13 15:42:19 -07:00
Linus Torvalds de639344bb audit/stable-7.1 PR 20260410
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmnZegUUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNydxAApWBVRWp/AY7jtCQGWRYAa+6y+bQ0
 RWfu8putXaOyk3NTeWP64e87FKsdByR/yflefYxMH+bXc2mwbuUZYAreEVmLCJ1P
 QxHKuwCkCNOz90n/Y7nlDSDK1GYdzlFkCgidfr4iNSCD58WMTtNNpZREzaNiR8a1
 PZ3bFvJH+S7BRCGA6/S/20rNYeWTga56pSrWt6VpMwVHGJ1R4DsD60pT8z0NqMYI
 BTBLeZ36HlZdwUp+APldKNNDRKG1ZQVKJRO68qcSkopr4vQzK7yL/SJsCdU8MHj2
 LccXTCTHHWJbpdiE7BtzPO9UobVZIdcz2wsnJHWxzHYtXlPolgM7F31111GL4HSv
 V/mq5o7dR3h6nn+1gkWHjOpd/f3J3xl3FaJsH9FIIhPmCRHb4oZI0WG0ZH3mHZBl
 o6aaWja3PBl0XNA+q87DQVBYDOyVNB4RjuaKy+d7hm4eronTRaZkg3zutrB6/XxP
 uFbp+Q3diWNMsYO52DKFThL/sStmnnCMIRJuTxd8QaPhLVakaFSkWZycSUH4HijD
 8WMk3e4yo3TeD6rCAognwKclj0vCMHS3TLOMXlY0vMD04gwXJ2S81yfyXGT4F5De
 KkXj61TFMxPyiZ6yrxk86BmoqHL0DUiCDn1rMKbNdIncHedKZoNuy+O/XNLS6No/
 hLRvXSI7MNthJ5E=
 =1rY2
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:

 - Improved handling of unknown status requests from userspace

   The current kernel code ignores unknown/unused request bits sent from
   userspace and returns an error code based on the results of the
   request(s) it does understand. The patch from Ricardo fixes this so
   that unknown requests return an -EINVAL to userspace, making
   compatibility a bit easier moving forward.

 - A number of small style and formatting cleanups

* tag 'audit-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: handle unknown status requests in audit_receive_msg()
  audit: fix coding style issues
  audit: remove redundant initialization of static variables to 0
  audit: fix whitespace alignment in include/uapi/linux/audit.h
2026-04-13 14:56:54 -07:00
Linus Torvalds 07c3ef5822 vfs-7.1-rc1.pidfs
Please consider pulling these changes from the signed vfs-7.1-rc1.pidfs tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCwAKCRCRxhvAZXjc
 omfuAQDckt5g7vxBr9hKdyrq1//nsu44fst/mRqr2iSYjuKfPQD/VN6Lw9e56Y/q
 l4hHxsPPrSSxbijwng7im36iPIGdfwI=
 =BbFh
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.1-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull clone and pidfs updates from Christian Brauner:
 "Add three new clone3() flags for pidfd-based process lifecycle
  management.

  CLONE_AUTOREAP:

     CLONE_AUTOREAP makes a child process auto-reap on exit without ever
     becoming a zombie. This is a per-process property in contrast to
     the existing auto-reap mechanism via SA_NOCLDWAIT or SIG_IGN for
     SIGCHLD which applies to all children of a given parent.

     Currently the only way to automatically reap children is to set
     SA_NOCLDWAIT or SIG_IGN on SIGCHLD. This is a parent-scoped
     property affecting all children which makes it unsuitable for
     libraries or applications that need selective auto-reaping of
     specific children while still being able to wait() on others.

     CLONE_AUTOREAP stores an autoreap flag in the child's
     signal_struct. When the child exits do_notify_parent() checks this
     flag and causes exit_notify() to transition the task directly to
     EXIT_DEAD. Since the flag lives on the child it survives
     reparenting: if the original parent exits and the child is
     reparented to a subreaper or init the child still auto-reaps when
     it eventually exits. This is cleaner than forcing the subreaper to
     get SIGCHLD and then reaping it. If the parent doesn't care the
     subreaper won't care. If there's a subreaper that would care it
     would be easy enough to add a prctl() that either just turns back
     on SIGCHLD and turns off auto-reaping or a prctl() that just
     notifies the subreaper whenever a child is reparented to it.

     CLONE_AUTOREAP can be combined with CLONE_PIDFD to allow the parent
     to monitor the child's exit via poll() and retrieve exit status via
     PIDFD_GET_INFO. Without CLONE_PIDFD it provides a fire-and-forget
     pattern. No exit signal is delivered so exit_signal must be zero.
     CLONE_THREAD and CLONE_PARENT are rejected: CLONE_THREAD because
     autoreap is a process-level property, and CLONE_PARENT because an
     autoreap child reparented via CLONE_PARENT could become an
     invisible zombie under a parent that never calls wait().

     The flag is not inherited by the autoreap process's own children.
     Each child that should be autoreaped must be explicitly created
     with CLONE_AUTOREAP.

  CLONE_NNP:

     CLONE_NNP sets no_new_privs on the child at clone time. Unlike
     prctl(PR_SET_NO_NEW_PRIVS) which a process sets on itself,
     CLONE_NNP allows the parent to impose no_new_privs on the child at
     creation without affecting the parent's own privileges.
     CLONE_THREAD is rejected because threads share credentials.
     CLONE_NNP is useful on its own for any spawn-and-sandbox pattern
     but was specifically introduced to enable unprivileged usage of
     CLONE_PIDFD_AUTOKILL.

  CLONE_PIDFD_AUTOKILL:

     This flag ties a child's lifetime to the pidfd returned from
     clone3(). When the last reference to the struct file created by
     clone3() is closed the kernel sends SIGKILL to the child. A pidfd
     obtained via pidfd_open() for the same process does not keep the
     child alive and does not trigger autokill - only the specific
     struct file from clone3() has this property. This is useful for
     container runtimes, service managers, and sandboxed subprocess
     execution - any scenario where the child must die if the parent
     crashes or abandons the pidfd or just wants a throwaway helper
     process.

     CLONE_PIDFD_AUTOKILL requires both CLONE_PIDFD and CLONE_AUTOREAP.
     It requires CLONE_PIDFD because the whole point is tying the
     child's lifetime to the pidfd. It requires CLONE_AUTOREAP because a
     killed child with no one to reap it would become a zombie - the
     primary use case is the parent crashing or abandoning the pidfd so
     no one is around to call waitpid(). CLONE_THREAD is rejected
     because autokill targets a process not a thread.

     If CLONE_NNP is specified together with CLONE_PIDFD_AUTOKILL an
     unprivileged user may spawn a process that is autokilled. The child
     cannot escalate privileges via setuid/setgid exec after being
     spawned. If CLONE_PIDFD_AUTOKILL is specified without CLONE_NNP the
     caller must have have CAP_SYS_ADMIN in its user namespace"

* tag 'vfs-7.1-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests: check pidfd_info->coredump_code correctness
  pidfds: add coredump_code field to pidfd_info
  kselftest/coredump: reintroduce null pointer dereference
  selftests/pidfd: add CLONE_PIDFD_AUTOKILL tests
  selftests/pidfd: add CLONE_NNP tests
  selftests/pidfd: add CLONE_AUTOREAP tests
  pidfd: add CLONE_PIDFD_AUTOKILL
  clone: add CLONE_NNP
  clone: add CLONE_AUTOREAP
2026-04-13 13:27:11 -07:00
Linus Torvalds 086aca1030 s390:
* KVM: s390: vsie: Fix races with partial gmap invalidations
 
 x86:
 * KVM: x86: Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAs
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmnaOrgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPTOwf8DW6BXYgHdDOWiiQQETD+D/JWbceG
 9fKMdNjt48It3hKWc9oJ2eZU2avRHf7d8hAIUIhOeiUbeVf4QrLUfQXzP9j/9P+T
 vRpMlDf5Ampv3m8LxTBGESgwrlRHtWDGUFsE+CcVAIWEQfCsXnbwkeo3L9aCLTgA
 ekrnHqsx+Oh/n2+siEp0Nz0n8gT0hCtbqAqJlVcuHpJvzRzeDvcnukvHxjIydR65
 uIFY5dahzheGqbPhplGKKAdPCHD+/S6QB+ShqKrT92zeZvhPZrt1XHV4Bt/sKSkP
 9uAeuJ+JtbZvMG7n8fCg5ebwqJrw15uddZcV8l3qIuxHzyZ/XzKYhQm4oA==
 =7zjP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "s390:
   - vsie: Fix races with partial gmap invalidations

  x86:
   - Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: s390: vsie: Fix races with partial gmap invalidations
  KVM: x86: Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAs
2026-04-11 11:45:20 -07:00
Paolo Bonzini 0e9b0e0124 KVM x86 fixes for 7.1
Declare flexible arrays in uAPI structures using __DECLARE_FLEX_ARRAY() so
 that KVM's uAPI headers can be included in C++ projects.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmnZHqsACgkQOlYIJqCj
 N/331BAApsvBOvcKxeHM598wAsZTtBIUiMWrm/hybB2zhXxbcC9BDPN5NrYP9eJX
 khlLm9YRDI2Hvk8QNuwPXV/mHU5U0HNJ48BUToL6H5x6792dnRCbL046rYCyRZbi
 bUjMcUjTWtv7g+UEoYOsMpmYQlTlf4krCbw2ixn6/4c2Ab76TRmrISU+tknoal+b
 KGXEJWhsEiueUD8xpjR84P0h3d6x+EHc2oyDk/k+aFZAcbjWFz/8aLjOSVc00V36
 DqYKbMGO/22CNkWSLk9Dr6mitn6HmG151HNAvUvHlPMFQLrP9jpk13u1IHZsR7H4
 4yykj4tm5+02775IFqfPLNZ4Ipk70WO50ndl3plh7G7187ckYsfl+oQu2c4oIveB
 sPfnEHqteGKw+GLOM9Xu4MVCAvy7FFGlgGIkZpULA7cLNyIayfmKuZTF7/UEh9Wy
 fL2UAypzJjCIgLFyoio4CHqLJ4vuUneyHyoJczS/Wd9kuL0kHrLEw771gLGjwslB
 Nk0900qPlxEWx47G6MadzSh+JexjT9KrSCgaACNWKJpzQMkcw6gwDSIGq9F/f0Ac
 Zl7XFbQ2KKm/Z8CWCTg5XdxI6zz0NCzlcYoepk7CfHgs0xZjWqu5AReLcQnffx2/
 c5YKoOffkicokffWQju64kjE/VYpdAITHONa1r7hv73/bfYfUGg=
 =VJsI
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-fixes-7.1' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 7.1

Declare flexible arrays in uAPI structures using __DECLARE_FLEX_ARRAY() so
that KVM's uAPI headers can be included in C++ projects.
2026-04-11 14:10:44 +02:00
Linus Torvalds e774d5f1bc RISC-V updates for v7.0-rc8
Before v7.0 is released, fix a few issues with the CFI patchset,
 merged earlier in v7.0-rc, that primarily affect interfaces to
 non-kernel code:
 
 - Improve the prctl() interface for per-task indirect branch landing
   pad control to expand abbreviations and to resemble the speculation
   control prctl() interface
 
 - Expand the "LP" and "SS" abbreviations in the ptrace uapi header
   file to "branch landing pad" and "shadow stack", to improve
   readability
 
 - Fix a typo in a CFI-related macro name in the ptrace uapi header
   file
 
 - Ensure that the indirect branch tracking state and shadow stack
   state are unlocked immediately after an exec() on the new task so
   that libc subsequently can control it
 
 - While working in this area, clean up the kernel-internal,
   cross-architecture prctl() function names by expanding the
   abbreviations mentioned above
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmnYP5YACgkQx4+xDQu9
 KkuoPQ//Yye5D+35EqfA12yP96Vrtg0QCKiqMotz3yLo0T7zh5KosAs/QIE5eQi7
 vWRnCld5PsFa0ZS2822oPfQo8pKVO1y7M2ecFWSwaOWq865Xs82M/puqEQF3GFCS
 219cg1dTVBGvvKSf4MINUBRprfZmZRT9pzhSk79qHEbHKzwCDk7uah51iUdyPJyd
 KX3hshYMLq3rooTHR2wD/ChTpV+pCrt2rSUVbW8+sTUWDfv2sTLauHmemKw7LpdW
 C0SulXvcYkGyiqsB5AXW9x2ttJ5hX9diPb73XS6eBCU0CaMl9BVZWNKeqhEMJxKR
 wmqIadD8pelf7Jh7wGAbNW4hWqTsO3xRpZH38Y/cGLdhs3cqvKjEmT3fOFWUP9bP
 hWv5027gVXVSOmvxhPiUJs7D5WWAz4Q64JZfdJSmDdEWVXcI0v/hzdukuPw4iiT6
 DaqOyClTcwc+j1jawFTICXTF7wXfvZT5sjulrmPk1HX4nZ5padKpfQ77AdKHF9Q6
 9pC25QHQk42h/R4ynA4lm15YnCOfYvjP25hU7K64gQnqO6qBrolfrA4kJOmdYv/g
 1IXsA2YZafJbcXwyFZjWy50uu5gaCM5JhRRFdUrjmB6j3gv9HfBlWJXQywReUjPo
 Kq4tnFppxzFVm23COj9j5kyjsFjUhZ8KCft3+n7lrndeOCk5Z3E=
 =5/Ct
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-v7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Paul Walmsley:
 "Before v7.0 is released, fix a few issues with the CFI patchset,
  merged earlier in v7.0-rc, that primarily affect interfaces to
  non-kernel code:

   - Improve the prctl() interface for per-task indirect branch landing
     pad control to expand abbreviations and to resemble the speculation
     control prctl() interface

   - Expand the "LP" and "SS" abbreviations in the ptrace uapi header
     file to "branch landing pad" and "shadow stack", to improve
     readability

   - Fix a typo in a CFI-related macro name in the ptrace uapi header
     file

   - Ensure that the indirect branch tracking state and shadow stack
     state are unlocked immediately after an exec() on the new task so
     that libc subsequently can control it

   - While working in this area, clean up the kernel-internal,
     cross-architecture prctl() function names by expanding the
     abbreviations mentioned above"

* tag 'riscv-for-linus-v7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  prctl: cfi: change the branch landing pad prctl()s to be more descriptive
  riscv: ptrace: cfi: expand "SS" references to "shadow stack" in uapi headers
  prctl: rename branch landing pad implementation functions to be more explicit
  riscv: ptrace: expand "LP" references to "branch landing pads" in uapi headers
  riscv: cfi: clear CFI lock status in start_thread()
  riscv: ptrace: cfi: fix "PRACE" typo in uapi header
2026-04-10 17:27:08 -07:00
Ming Lei 23b3b6f0b5 ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
The __u32 len field cannot represent a 4GB buffer (0x100000000
overflows to 0). Change it to __u64 so buffers up to 4GB can be
registered. Add a reserved field for alignment and validate it
is zero.

The kernel enforces a default max of 4GB (UBLK_SHMEM_BUF_SIZE_MAX)
which may be increased in future.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260409133020.3780098-2-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-09 19:08:35 -06:00
Linus Torvalds 7f87a5ea75 hid-for-linus-2026040801
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEL65usyKPHcrRDEicpmLzj2vtYEkFAmnWuXcACgkQpmLzj2vt
 YEnxBA/+IUv7shC+2A9VOmVpPKvF7aP82hyUAaALF0JaUtGuwmKNuMy4g13S+go/
 5b6MWDlZ2pjN5lY6jFBVLcxfCysKKS3BKgfPza3y0PK1CoVeXzWniTuVjaHY9Mi+
 +I64gAa/Ss1C872meBKvzNvgRnHzDFy6v7bp2Ly57Elack9ZZal6KfXBB3sMkVVB
 VCcFAODhiPbr4lgRlMdn6WIEnbNM3TAkmPUKTb4iyOV5qyB5/JQPWT7oqTv0FV6A
 z8AYAMVmpVFZyQT4NMKFdNPumziPTp4sQZuo9Wt8ybQuIlzaw1E1npkooNU9mfZd
 suKwQAEFnGDWA0T/5rySQ9c2P732PyBgGXpTzPAUUVmsK+LYD/CK7gkCkfursDoI
 RJQ8KE/RurQ7OKg8ABFpMw3OnmOfpwH7R4/EJ02PbXFEHWXGt37DrRiqJ5ud1UOj
 khm6b+nF1NbLzg+B5fZP7DtlPC76wOXHlAbiFloqEkG6gJl0fLfhmgFi5Pg/GZ92
 eWJdUJ5ECnJL9UL2FTDxmFEcIyNBrZh+TS0MHdfvhSUHoyGWO5WNzzTqB/RCrgBF
 UVthzr/kSuvIkXbdjG1jDD/y+GHMVjv7HSSeh1p/l/1QfWljk6cglpl6LecaHC2Z
 ee2Ra2Jnw7l8PvQq6VH9WkRU94CtQtCHEyztjPW9IaR8CQk+2Oo=
 =KzYV
 -----END PGP SIGNATURE-----

Merge tag 'hid-for-linus-2026040801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - handling of new keycodes for contextual AI usages (Akshai Murari)

 - fix for UAF in hid-roccat (Benoît Sevens)

 - deduplication of error logging in amd_sfh (Maximilian Pezzullo)

 - various device-specific quirks and device ID additions (Even Xu, Lode
   Willems, Leo Vriska)

* tag 'hid-for-linus-2026040801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  Input: add keycodes for contextual AI usages (HUTRR119)
  HID: Kysona: Add support for VXE Dragonfly R1 Pro
  HID: amd_sfh: don't log error when device discovery fails with -EOPNOTSUPP
  HID: quirks: add HID_QUIRK_ALWAYS_POLL for 8BitDo Pro 3
  HID: roccat: fix use-after-free in roccat_report_event
  HID: Intel-thc-hid: Intel-quickspi: Add NVL Device IDs
  HID: Intel-thc-hid: Intel-quicki2c: Add NVL Device IDs
2026-04-08 13:38:30 -07:00
Qu Wenruo 52e71eb95c btrfs: tree-checker: introduce checks for FREE_SPACE_INFO
Introduce checks for FREE_SPACE_INFO item, which include:

- Key alignment check
  The objectid is the logical bytenr of the chunk/bg, and offset is the
  length of the chunk/bg, thus they should all be aligned to the fs
  block size.

- Item size check
  The FREE_SPACE_INFO should a fix size.

- Flags check
  The flags member should have no other flags than
  BTRFS_FREE_SPACE_USING_BITMAPS.

  For future expansion, introduce a new macro
  BTRFS_FREE_SPACE_FLAGS_MASK for such checks.

  And since we're here, the BTRFS_FREE_SPACE_USING_BITMAPS should not
  use unsigned long long, as the flags is only 32 bits wide.
  So fix that to use unsigned long.

- Extent count check
  That member shows how many free space bitmap/extent items there are
  inside the chunk/bg.

  We know the chunk size (from key->offset), thus there should be at
  most (key->offset >> sectorsize_bits) blocks inside the chunk.
  Use that value as the upper limit and if that counter is larger than
  that, there is a high chance it's a bitflip in high bits.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07 18:56:01 +02:00
Günther Noack ae97330d1b
landlock: Control pathname UNIX domain socket resolution by path
* Add a new access right LANDLOCK_ACCESS_FS_RESOLVE_UNIX, which
  controls the lookup operations for named UNIX domain sockets.  The
  resolution happens during connect() and sendmsg() (depending on
  socket type).
* Change access_mask_t from u16 to u32 (see below)
* Hook into the path lookup in unix_find_bsd() in af_unix.c, using a
  LSM hook.  Make policy decisions based on the new access rights
* Increment the Landlock ABI version.
* Minor test adaptations to keep the tests working.
* Document the design rationale for scoped access rights,
  and cross-reference it from the header documentation.

With this access right, access is granted if either of the following
conditions is met:

* The target socket's filesystem path was allow-listed using a
  LANDLOCK_RULE_PATH_BENEATH rule, *or*:
* The target socket was created in the same Landlock domain in which
  LANDLOCK_ACCESS_FS_RESOLVE_UNIX was restricted.

In case of a denial, connect() and sendmsg() return EACCES, which is
the same error as it is returned if the user does not have the write
bit in the traditional UNIX file system permissions of that file.

The access_mask_t type grows from u16 to u32 to make space for the new
access right.  This also doubles the size of struct layer_access_masks
from 32 byte to 64 byte.  To avoid memory layout inconsistencies between
architectures (especially m68k), pack and align struct access_masks [2].

Document the (possible future) interaction between scoped flags and
other access rights in struct landlock_ruleset_attr, and summarize the
rationale, as discussed in code review leading up to [3].

This feature was created with substantial discussion and input from
Justin Suess, Tingmao Wang and Mickaël Salaün.

Cc: Tingmao Wang <m@maowtm.org>
Cc: Justin Suess <utilityemal77@gmail.com>
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Suggested-by: Jann Horn <jannh@google.com>
Link[1]: https://github.com/landlock-lsm/linux/issues/36
Link[2]: https://lore.kernel.org/all/20260401.Re1Eesu1Yaij@digikod.net/
Link[3]: https://lore.kernel.org/all/20260205.8531e4005118@gnoack.org/
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20260327164838.38231-5-gnoack3000@gmail.com
[mic: Fix kernel-doc formatting, pack and align access_masks]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:06 +02:00
Mickaël Salaün e75e38055b
landlock: Allow TSYNC with LOG_SUBDOMAINS_OFF and fd=-1
LANDLOCK_RESTRICT_SELF_TSYNC does not allow
LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF with ruleset_fd=-1, preventing
a multithreaded process from atomically propagating subdomain log muting
to all threads without creating a domain layer.  Relax the fd=-1
condition to accept TSYNC alongside LOG_SUBDOMAINS_OFF, and update the
documentation accordingly.

Add flag validation tests for all TSYNC combinations with ruleset_fd=-1,
and audit tests verifying both transition directions: muting via TSYNC
(logged to not logged) and override via TSYNC (not logged to logged).

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 42fc7e6543 ("landlock: Multithreading support for landlock_restrict_self()")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260407164107.2012589-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-04-07 18:51:00 +02:00
Ming Lei 08677040a9 ublk: enable UBLK_F_SHMEM_ZC feature flag
Add UBLK_F_SHMEM_ZC (1ULL << 19) to the UAPI header and UBLK_F_ALL.
Switch ublk_support_shmem_zc() and ublk_dev_support_shmem_zc() from
returning false to checking the actual flag, enabling the shared
memory zero-copy feature for devices that request it.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-4-ming.lei@redhat.com
[axboe: ublk_buf_reg -> ublk_shmem_buf_reg errors]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07 07:39:24 -06:00
Ming Lei 2fb0ded237 ublk: add UBLK_U_CMD_REG_BUF/UNREG_BUF control commands
Add control commands for registering and unregistering shared memory
buffers for zero-copy I/O:

- UBLK_U_CMD_REG_BUF (0x18): pins pages from userspace, inserts PFN
  ranges into a per-device maple tree for O(log n) lookup during I/O.
  Buffer pointers are tracked in a per-device xarray. Returns the
  assigned buffer index.

- UBLK_U_CMD_UNREG_BUF (0x19): removes PFN entries and unpins pages.

Queue freeze/unfreeze is handled internally so userspace need not
quiesce the device during registration.

Also adds:
- UBLK_IO_F_SHMEM_ZC flag and addr encoding helpers in UAPI header
  (16-bit buffer index supporting up to 65536 buffers)
- Data structures (ublk_buf, ublk_buf_range) and xarray/maple tree
- __ublk_ctrl_reg_buf() helper for PFN insertion with error unwinding
- __ublk_ctrl_unreg_buf() helper for cleanup reuse
- ublk_support_shmem_zc() / ublk_dev_support_shmem_zc() stubs
  (returning false — feature not enabled yet)

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-2-ming.lei@redhat.com
[axboe: fixup ublk_buf_reg -> ublk_shmem_buf_reg errors, comments]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07 07:38:26 -06:00
Paul Walmsley 08ee155905 prctl: cfi: change the branch landing pad prctl()s to be more descriptive
Per Linus' comments requesting the replacement of "INDIR_BR_LP" in the
indirect branch tracking prctl()s with something more readable, and
suggesting the use of the speculation control prctl()s as an exemplar,
reimplement the prctl()s and related constants that control per-task
forward-edge control flow integrity.

This primarily involves two changes.  First, the prctls are
restructured to resemble the style of the speculative execution
workaround control prctls PR_{GET,SET}_SPECULATION_CTRL, to make them
easier to extend in the future.  Second, the "indir_br_lp" abbrevation
is expanded to "branch_landing_pads" to be less telegraphic.  The
kselftest and documentation is adjusted accordingly.

Link: https://lore.kernel.org/linux-riscv/CAHk-=whhSLGZAx3N5jJpb4GLFDqH_QvS07D+6BnkPWmCEzTAgw@mail.gmail.com/
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04 18:40:58 -06:00
Pavel Begunkov 825f276491 io_uring/zcrx: implement device-less mode for zcrx
Allow creating a zcrx instance without attaching it to a net device.
All data will be copied through the fallback path. The user is also
expected to use ZCRX_CTRL_FLUSH_RQ to handle overflows as it normally
should even with a netdev, but it becomes even more relevant as there
will likely be no one to automatically pick up buffers.

Apart from that, it follows the zcrx uapi for the I/O path, and is
useful for testing, experimentation, and potentially for the copy
receive path in the future if improved.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/674f8ad679c5a0bc79d538352b3042cf0999596e.1774261953.git.asml.silence@gmail.com
[axboe: fix spelling error in uapi header and commit message]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-01 10:21:12 -06:00
Milan Broz 499d2d2f4c sed-opal: Add STACK_RESET command
The TCG Opal device could enter a state where no new session can be
created, blocking even Discovery or PSID reset. While a power cycle
or waiting for the timeout should work, there is another possibility
for recovery: using the Stack Reset command.

The Stack Reset command is defined in the TCG Storage Architecture Core
Specification and is mandatory for all Opal devices (see Section 3.3.6
of the Opal SSC specification).

This patch implements the Stack Reset command. Sending it should clear
all active sessions immediately, allowing subsequent commands to run
successfully. While it is a TCG transport layer command, the Linux
kernel implements only Opal ioctls, so it makes sense to use the
IOC_OPAL ioctl interface.

The Stack Reset takes no arguments; the response can be success or pending.
If the command reports a pending state, userspace can try to repeat it;
in this case, the code returns -EBUSY.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Reviewed-by: Ondrej Kozina <okozina@redhat.com>
Link: https://patch.msgid.link/20260310095349.411287-1-gmazyland@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-31 07:04:00 -06:00
Akshai Murari 45065a5095 Input: add keycodes for contextual AI usages (HUTRR119)
HUTRR119 introduces new usages for keys intended to invoke AI agents
based on the current context. These are useful with the increasing
number of operating systems with integrated Large Language Models

Add new key definitions for KEY_ACTION_ON_SELECTION,
KEY_CONTEXTUAL_INSERT and KEY_CONTEXTUAL_QUERY

Signed-off-by: Akshai Murari <akshaim@google.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2026-03-29 22:02:11 +02:00
David Carlier 8f15b5071b netfilter: ctnetlink: use netlink policy range checks
Replace manual range and mask validations with netlink policy
annotations in ctnetlink code paths, so that the netlink core rejects
invalid values early and can generate extack errors.

- CTA_PROTOINFO_TCP_STATE: reject values > TCP_CONNTRACK_SYN_SENT2 at
  policy level, removing the manual >= TCP_CONNTRACK_MAX check.
- CTA_PROTOINFO_TCP_WSCALE_ORIGINAL/REPLY: reject values > TCP_MAX_WSCALE
  (14). The normal TCP option parsing path already clamps to this value,
  but the ctnetlink path accepted 0-255, causing undefined behavior when
  used as a u32 shift count.
- CTA_FILTER_ORIG_FLAGS/REPLY_FLAGS: use NLA_POLICY_MASK with
  CTA_FILTER_F_ALL, removing the manual mask checks.
- CTA_EXPECT_FLAGS: use NLA_POLICY_MASK with NF_CT_EXPECT_MASK, adding
  a new mask define grouping all valid expect flags.

Extracted from a broader nf-next patch by Florian Westphal, scoped to
ctnetlink for the fixes tree.

Fixes: c8e2078cfe ("[NETFILTER]: ctnetlink: add support for internal tcp connection tracking flags handling")
Signed-off-by: David Carlier <devnexen@gmail.com>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-03-26 13:28:17 +01:00
Emanuele Rocca 701f7f4fba
pidfds: add coredump_code field to pidfd_info
The struct pidfd_info currently exposes in a field called coredump_signal the
signal number (si_signo) that triggered the dump (for example, 11 for SIGSEGV).
However, it is also valuable to understand the reason why that signal was sent.
This additional context is provided by the signal code (si_code), such as 2 for
SEGV_ACCERR.

Add a new field to struct pidfd_info called coredump_code with the value of
si_code for the benefit of sysadmins who pipe core dumps to user-space programs
for later analysis. The following snippet illustrates a simplified C program
that consumes coredump_signal and coredump_code, and then logs core dump
signals and codes to a file:

    int pidfd = (int)atoi(argv[1]);

    struct pidfd_info info = {
        .mask = PIDFD_INFO_EXIT | PIDFD_INFO_COREDUMP,
    };

    if (ioctl(pidfd, PIDFD_GET_INFO, &info) == 0)
        if (info.mask & PIDFD_INFO_COREDUMP)
            fprintf(f, "PID=%d, si_signo: %d si_code: %d\n",
                info.pid, info.coredump_signal, info.coredump_code);

Assuming the program is installed under /usr/local/bin/core-logger, core dump
processing can be enabled by setting /proc/sys/kernel/core_pattern to
'|/usr/local/bin/dumpstuff %F'.

systemd-coredump(8) already uses pidfds to process core dumps, and it could be
extended to include the values of coredump_code too.

Signed-off-by: Emanuele Rocca <emanuele.rocca@arm.com>
Link: https://patch.msgid.link/acE52HIFivNZN3nE@NH27D9T0LF
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-23 16:29:15 +01:00
Yang Xiuwei 7da9261bab bsg: add bsg_uring_cmd uapi structure
Add the bsg_uring_cmd structure to the BSG UAPI header to support
io_uring-based SCSI passthrough operations via IORING_OP_URING_CMD.

Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260317072226.2598233-2-yangxiuwei@kylinos.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-19 11:38:24 -06:00
Linus Torvalds 11e8c7e947 ARM:
- Correctly handle deeactivation of interrupts that were activated from
   LRs.  Since EOIcount only denotes deactivation of interrupts that
   are not present in an LR, start EOIcount deactivation walk *after*
   the last irq that made it into an LR.
 
 - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when
   pKVM is already enabled -- not only thhis isn't possible (pKVM
   will reject the call), but it is also useless: this can only
   happen for a CPU that has already booted once, and the capability
   will not change.
 
 - Fix a couple of low-severity bugs in our S2 fault handling path,
   affecting the recently introduced LS64 handling and the even more
   esoteric handling of hwpoison in a nested context
 
 - Address yet another syzkaller finding in the vgic initialisation,
   where we would end-up destroying an uninitialised vgic with nasty
   consequences
 
 - Address an annoying case of pKVM failing to boot when some of the
   memblock regions that the host is faulting in are not page-aligned
 
 - Inject some sanity in the NV stage-2 walker by checking the limits
   against the advertised PA size, and correctly report the resulting
   faults
 
 PPC:
 
 - Fix a PPC e500 build error due to a long-standing wart that was exposed by
   the recent conversion to kmalloc_obj(); rip out all the ugliness that
   led to the wart.
 
 RISC-V:
 
 - Prevent speculative out-of-bounds access using array_index_nospec()
   in APLIC interrupt handling, ONE_REG regiser access, AIA CSR access,
   float register access, and PMU counter access
 
 - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(),
   kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr()
 
 - Fix potential null pointer dereference in kvm_riscv_vcpu_aia_rmw_topei()
 
 - Fix off-by-one array access in SBI PMU
 
 - Skip THP support check during dirty logging
 
 - Fix error code returned for Smstateen and Ssaia ONE_REG interface
 
 - Check host Ssaia extension when creating AIA irqchip
 
 x86:
 
 - Fix cases where CPUID mitigation features were incorrectly marked as
   available whenever the kernel used scattered feature words for them.
 
 - Validate _all_ GVAs, rather than just the first GVA, when processing
   a range of GVAs for Hyper-V's TLB flush hypercalls.
 
 - Fix a brown paper bug in add_atomic_switch_msr().
 
 - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list,
   to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu.
 
 - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel local
   APIC (and AVIC is enabled at the module level).
 
 - Update CR8 write interception when AVIC is (de)activated, to fix a bug
   where the guest can run in perpetuity with the CR8 intercept enabled.
 
 - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to allow
   L1 hypervisors to set FREEZE_IN_SMM.  This reverts (by default) an
   unintentional tightening of userspace ABI in 6.17, and provides some
   amount of backwards compatibility with hypervisors who want to freeze
   PMCs on VM-Entry.
 
 - Validate the VMCS/VMCB on return to a nested guest from SMM, because
   either userspace or the guest could stash invalid values in memory
   and trigger the processor's consistency checks.
 
 Generic:
 
 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
   unnecessary and confusing, triggered compiler warnings due to
   -Wflex-array-member-not-at-end.
 
 - Document that vcpu->mutex is take outside of kvm->slots_lock and
   kvm->slots_arch_lock, which is intentional and desirable despite being
   rather unintuitive.
 
 Selftests:
 
 - Increase the maximum number of NUMA nodes in the guest_memfd selftest to
   64 (from 8).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmmy6n8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNX7ggAhWoCG+AE6P3yrp6Mi+nRYpeRGC3q
 q2IiZCn0UoCg6q3c2kgn7b/N2zLJs0Q8FZRCEp2Je+2uvptpmdp/BMEfiIU3n2/a
 61z+Dydbpyc+kUmhJzUJ+aotq5FnMNmAAmqSKoc19GhAx2OQhQmBP/JOZ0P/eqLE
 Is0qNBgr/Zms2ib3GFf/JT+urysL2mX47qe92HTzq1T9EEG0KleID0Jz8vYQI8Fr
 I5N9+lTxagQDi8ytwOM85Cn8K7wh+CQIgzmciHcVErpAvAWkrEjrPlQltpEz2C5B
 aWEcRgw46utEaAiwPQGJRW6TeoKUG0pUR3v6T90nBkjjJ1npm6gPVE6TBA==
 =7nQ9
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Quite a large pull request, partly due to skipping last week and
  therefore having material from ~all submaintainers in this one. About
  a fourth of it is a new selftest, and a couple more changes are large
  in number of files touched (fixing a -Wflex-array-member-not-at-end
  compiler warning) or lines changed (reformatting of a table in the API
  documentation, thanks rST).

  But who am I kidding---it's a lot of commits and there are a lot of
  bugs being fixed here, some of them on the nastier side like the
  RISC-V ones.

  ARM:

   - Correctly handle deactivation of interrupts that were activated
     from LRs. Since EOIcount only denotes deactivation of interrupts
     that are not present in an LR, start EOIcount deactivation walk
     *after* the last irq that made it into an LR

   - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM
     is already enabled -- not only thhis isn't possible (pKVM will
     reject the call), but it is also useless: this can only happen for
     a CPU that has already booted once, and the capability will not
     change

   - Fix a couple of low-severity bugs in our S2 fault handling path,
     affecting the recently introduced LS64 handling and the even more
     esoteric handling of hwpoison in a nested context

   - Address yet another syzkaller finding in the vgic initialisation,
     where we would end-up destroying an uninitialised vgic with nasty
     consequences

   - Address an annoying case of pKVM failing to boot when some of the
     memblock regions that the host is faulting in are not page-aligned

   - Inject some sanity in the NV stage-2 walker by checking the limits
     against the advertised PA size, and correctly report the resulting
     faults

  PPC:

   - Fix a PPC e500 build error due to a long-standing wart that was
     exposed by the recent conversion to kmalloc_obj(); rip out all the
     ugliness that led to the wart

  RISC-V:

   - Prevent speculative out-of-bounds access using array_index_nospec()
     in APLIC interrupt handling, ONE_REG regiser access, AIA CSR
     access, float register access, and PMU counter access

   - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(),
     kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr()

   - Fix potential null pointer dereference in
     kvm_riscv_vcpu_aia_rmw_topei()

   - Fix off-by-one array access in SBI PMU

   - Skip THP support check during dirty logging

   - Fix error code returned for Smstateen and Ssaia ONE_REG interface

   - Check host Ssaia extension when creating AIA irqchip

  x86:

   - Fix cases where CPUID mitigation features were incorrectly marked
     as available whenever the kernel used scattered feature words for
     them

   - Validate _all_ GVAs, rather than just the first GVA, when
     processing a range of GVAs for Hyper-V's TLB flush hypercalls

   - Fix a brown paper bug in add_atomic_switch_msr()

   - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list,
     to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu

   - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel
     local APIC (and AVIC is enabled at the module level)

   - Update CR8 write interception when AVIC is (de)activated, to fix a
     bug where the guest can run in perpetuity with the CR8 intercept
     enabled

   - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to
     allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by
     default) an unintentional tightening of userspace ABI in 6.17, and
     provides some amount of backwards compatibility with hypervisors
     who want to freeze PMCs on VM-Entry

   - Validate the VMCS/VMCB on return to a nested guest from SMM,
     because either userspace or the guest could stash invalid values in
     memory and trigger the processor's consistency checks

  Generic:

   - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from
     being unnecessary and confusing, triggered compiler warnings due to
     -Wflex-array-member-not-at-end

   - Document that vcpu->mutex is take outside of kvm->slots_lock and
     kvm->slots_arch_lock, which is intentional and desirable despite
     being rather unintuitive

  Selftests:

   - Increase the maximum number of NUMA nodes in the guest_memfd
     selftest to 64 (from 8)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
  KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8
  Documentation: kvm: fix formatting of the quirks table
  KVM: x86: clarify leave_smm() return value
  selftests: kvm: add a test that VMX validates controls on RSM
  selftests: kvm: extract common functionality out of smm_test.c
  KVM: SVM: check validity of VMCB controls when returning from SMM
  KVM: VMX: check validity of VMCS controls when returning from SMM
  KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
  KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM
  KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers()
  KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr()
  KVM: x86: hyper-v: Validate all GVAs during PV TLB flush
  KVM: x86: synthesize CPUID bits only if CPU capability is set
  KVM: PPC: e500: Rip out "struct tlbe_ref"
  KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type
  KVM: selftests: Increase 'maxnode' for guest_memfd tests
  KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug
  KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail
  KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2()
  ...
2026-03-15 12:22:10 -07:00
David Woodhouse 2619da73bb KVM: x86: Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAs
Commit 94dfc73e7c ("treewide: uapi: Replace zero-length arrays with
flexible-array members") broke the userspace API for C++.

These structures ending in VLAs are typically a *header*, which can be
followed by an arbitrary number of entries. Userspace typically creates
a larger structure with some non-zero number of entries, for example in
QEMU's kvm_arch_get_supported_msr_feature():

    struct {
        struct kvm_msrs info;
        struct kvm_msr_entry entries[1];
    } msr_data = {};

While that works in C, it fails in C++ with an error like:
 flexible array member 'kvm_msrs::entries' not at end of 'struct msr_data'

Fix this by using __DECLARE_FLEX_ARRAY() for the VLA, which uses [0]
for C++ compilation.

Fixes: 94dfc73e7c ("treewide: uapi: Replace zero-length arrays with flexible-array members")
Cc: stable@vger.kernel.org
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/3abaf6aefd6e5efeff3b860ac38421d9dec908db.camel@infradead.org
[sean: tag for stable@]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-12 10:56:10 -07:00
Christian Brauner c8134b5f13
pidfd: add CLONE_PIDFD_AUTOKILL
Add a new clone3() flag CLONE_PIDFD_AUTOKILL that ties a child's
lifetime to the pidfd returned from clone3(). When the last reference to
the struct file created by clone3() is closed the kernel sends SIGKILL
to the child. A pidfd obtained via pidfd_open() for the same process
does not keep the child alive and does not trigger autokill - only the
specific struct file from clone3() has this property.

This is useful for container runtimes, service managers, and sandboxed
subprocess execution - any scenario where the child must die if the
parent crashes or abandons the pidfd.

CLONE_PIDFD_AUTOKILL requires both CLONE_PIDFD (the whole point is tying
lifetime to the pidfd file) and CLONE_AUTOREAP (a killed child with no
one to reap it would become a zombie). CLONE_THREAD is rejected because
autokill targets a process not a thread.

The clone3 pidfd is identified by the PIDFD_AUTOKILL file flag set on
the struct file at clone3() time. The pidfs .release handler checks this
flag and sends SIGKILL via do_send_sig_info(SIGKILL, SEND_SIG_PRIV, ...)
only when it is set. Files from pidfd_open() or open_by_handle_at() are
distinct struct files that do not carry this flag. dup()/fork() share the
same struct file so they extend the child's lifetime until the last
reference drops.

CLONE_PIDFD_AUTOKILL uses a privilege model based on CLONE_NNP: without
CLONE_NNP the child could escalate privileges via setuid/setgid exec
after being spawned, so the caller must have CAP_SYS_ADMIN in its user
namespace. With CLONE_NNP the child can never gain new privileges so
unprivileged usage is allowed. This is a deliberate departure from the
pdeath_signal model which is reset during secureexec and commit_creds()
rendering it useless for container runtimes that need to deprivilege
themselves.

Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-3-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-11 23:15:40 +01:00
Christian Brauner 24baca56fa
clone: add CLONE_NNP
Add a new clone3() flag CLONE_NNP that sets no_new_privs on the child
process at clone time. This is analogous to prctl(PR_SET_NO_NEW_PRIVS)
but applied at process creation rather than requiring a separate step
after the child starts running.

CLONE_NNP is rejected with CLONE_THREAD. It's conceptually a lot simpler
if the whole thread-group is forced into NNP and not have single threads
running around with NNP.

Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-2-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-11 23:15:15 +01:00
Christian Brauner 12ae2c81b2
clone: add CLONE_AUTOREAP
Add a new clone3() flag CLONE_AUTOREAP that makes a child process
auto-reap on exit without ever becoming a zombie. This is a per-process
property in contrast to the existing auto-reap mechanism via
SA_NOCLDWAIT or SIG_IGN for SIGCHLD which applies to all children of a
given parent.

Currently the only way to automatically reap children is to set
SA_NOCLDWAIT or SIG_IGN on SIGCHLD. This is a parent-scoped property
affecting all children which makes it unsuitable for libraries or
applications that need selective auto-reaping of specific children while
still being able to wait() on others.

CLONE_AUTOREAP stores an autoreap flag in the child's signal_struct.
When the child exits do_notify_parent() checks this flag and causes
exit_notify() to transition the task directly to EXIT_DEAD. Since the
flag lives on the child it survives reparenting: if the original parent
exits and the child is reparented to a subreaper or init the child still
auto-reaps when it eventually exits.

CLONE_AUTOREAP can be combined with CLONE_PIDFD to allow the parent to
monitor the child's exit via poll() and retrieve exit status via
PIDFD_GET_INFO. Without CLONE_PIDFD it provides a fire-and-forget
pattern where the parent simply doesn't care about the child's exit
status. No exit signal is delivered so exit_signal must be zero.

CLONE_AUTOREAP is rejected in combination with CLONE_PARENT. If a
CLONE_AUTOREAP child were to clone(CLONE_PARENT) the new grandchild
would inherit exit_signal == 0 from the autoreap parent's group leader
but without signal->autoreap. This grandchild would become a zombie that
never sends a signal and is never autoreaped - confusing and arguably
broken behavior.

The flag is not inherited by the autoreap process's own children. Each
child that should be autoreaped must be explicitly created with
CLONE_AUTOREAP.

Link: https://github.com/uapi-group/kernel-features/issues/45
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-1-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-11 23:14:02 +01:00
Paolo Bonzini 94fe3e6515 KVM generic changes for 7.0
- Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
    unnecessary and confusing, triggered compiler warnings due to
    -Wflex-array-member-not-at-end.
 
  - Document that vcpu->mutex is take outside of kvm->slots_lock and
    kvm->slots_arch_lock, which is intentional and desirable despite being
    rather unintuitive.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmmp19MACgkQOlYIJqCj
 N/02KA//e7D1DqCcDC46tMyLI+/Q6Wy0F40nXp0tTzJ+gRT5QesEw3jSQdXCRmPV
 yTFLyDaGYD2jqV+EpJLPYBT41oU2FXsjD5NFJRAISD5KPIJbACHvJUxWGYWLvaLU
 iMlwhqZimXKUFAECW2QpwLV8BQenyOEj5dVeKYdPjX6seIEeFlK6JAdteLK0g9gR
 gksE+9QzCFXt0cRfgkaA4UKcA+xWb3ThKMej1AadB6dGF7ezkMvyyQynGLB2N19L
 LZRpOXr70ypyaihC553Msgi4vrpVTPN2BjLrsudGN/IJv6QbdAz5jTU8Lwu9R5QT
 y9LiEPfdMT7WmIBxnH6V7HO5OoN8V2rGJpB/a3KvKO73QjhJJqNyqB6LDPqEbHyw
 AmhQCuQ8Pn1RLKQDXdKll+aI19vi7aOVpq67ii+I9xbzHgg5+uAzKr8hkPAibnVw
 KPGYqgYQa5j3jyRq6jRkAZSkEKZ9PoM8LMiqgnNW1ZrlrDqsPajKaegXODfLuvGf
 yLYtfXbZLMAIAM32YeIH0LrcAT7SEPUFkoh85IB2YOk0mfU1PxqrXOVTPh1GkY2Q
 bKH16T9S4zCfB20V+NYCn+juX4uCNb56b7/jbjI0Ueu/AGv/ITHwRrlhQvXuGSvN
 A65w+LSWlcgRQwLglCPpX308A4DcGCPcY4RvzoirBG+WWNn/Aj4=
 =bD3g
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-generic-7.0-rc3' of https://github.com/kvm-x86/linux into HEAD

KVM generic changes for 7.0

 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
   unnecessary and confusing, triggered compiler warnings due to
   -Wflex-array-member-not-at-end.

 - Document that vcpu->mutex is take outside of kvm->slots_lock and
   kvm->slots_arch_lock, which is intentional and desirable despite being
   rather unintuitive.
2026-03-11 18:01:55 +01:00
Ondrej Kozina 0cc9293bcc sed-opal: add IOC_OPAL_GET_SUM_STATUS ioctl.
This adds a function for retrieving the set of Locking objects enabled
for Single User Mode (SUM) and the value of the
RangeStartRangeLengthPolicy parameter.

It retrieves data from the LockingInfo table, specifically the
columns SingleUserModeRanges and RangeStartLengthPolicy, which
were added according to the TCG Opal Feature Set: Single User Mode,
as described in chapters 4.4.3.1 and 4.4.3.2.

Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 14:29:59 -06:00
Ondrej Kozina a441a9d224 sed-opal: add IOC_OPAL_ENABLE_DISABLE_LR.
This ioctl is used to set up RLE (read lock enabled) and WLE (write
lock enabled) parameters of the Locking object.

In Single User Mode (SUM), if the RangeStartRangeLengthPolicy parameter
is set in the 'Reactivate' method, only Admin authority maintains the
locking range length and start (offset) attributes of Locking objects
set up for SUM. All other attributes from struct opal_user_lr_setup
(RLE - read locking enabled, WLE - write locking enabled) shall
remain in possession of the User authority associated with the Locking
object set for SUM.

With the IOC_OPAL_ENABLE_DISABLE_LR ioctl, the opal_user_lr_setup
members 'range_start' and 'range_length' of the ioctl argument are
ignored.

Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 14:29:59 -06:00
Ondrej Kozina 8e3d34a7ce sed-opal: add IOC_OPAL_LR_SET_START_LEN ioctl.
This ioctl is used to set up locking range start (offset)
and locking range length attributes only.

In Single User Mode (SUM), if the RangeStartRangeLengthPolicy parameter
is set in the 'Reactivate' method, only Admin authority maintains the
locking range length and start (offset) attributes of Locking objects
set up for SUM. All other attributes from struct opal_user_lr_setup
(RLE - read locking enabled, WLE - write locking enabled) shall
remain in possession of the User authority associated with the Locking
object set for SUM.

Therefore, we need a separate function for setting up locking range
start and locking range length because it may require two different
authorities (and sessions) if the RangeStartRangeLengthPolicy attribute
is set.

With the IOC_OPAL_LR_SET_START_LEN ioctl, the opal_user_lr_setup
members 'RLE' and 'WLE' of the ioctl argument are ignored.

Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 14:29:59 -06:00
Ondrej Kozina aca086ff27 sed-opal: add IOC_OPAL_REACTIVATE_LSP.
This adds the 'Reactivate' method as described in the
"TCG Storage Opal SSC Feature Set: Single User Mode"
document (ch. 3.1.1.1).

The method enables switching an already active SED OPAL2 device,
with appropriate firmware support for Single User Mode (SUM),
to or from SUM.

Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-and-tested-by: Milan Broz <gmazyland@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 14:29:59 -06:00
Pavel Begunkov d8345a2190 io_uring/timeout: immediate timeout arg
One the things the user has always keep in mind is that any user
pointers they put into an SQE is not going to be read by the kernel
until submission happens, and the user has to ensure the pointee stays
alive until then. For example, snippet below will lead to UAF of the on
stack variable ts. Instead of passing the timeout value as a pointer
allow to store it immediately in the SQE. The user has to set a new flag
called IORING_TIMEOUT_IMMEDIATE_ARG, in which case sqe->addr for timeout
or sqe->addr2 for timeout update requests will be interpreted as a time
value in nanosecods.

void prep_timeout(struct io_uring_sqe *sqe) {
    struct __kernel_timespec ts = {...};
    prep_timeout(sqe, &ts);
}

void submit() {
    sqe = get_sqe();
    prep_timeout(sqe);
    io_uring_submit();
}

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:54 -06:00
Pavel Begunkov d9d2455e77 io_uring/zcrx: move zcrx uapi into separate header
Split out zcrx uapi into a separate file. It'll be easier to manage it
this way, and that reduces the size of a not so small io_uring.h. Since
there are users that expect that zcrx definitions come with io_uring.h,
it includes the new file.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-09 07:21:54 -06:00
Linus Torvalds dfb3142844 drm fixes for 7.0-rc3
mm:
 - mm: Fix a hmm_range_fault() livelock / starvation problem
 
 pagemap:
 - Revert "drm/pagemap: Disable device-to-device migration"
 
 ttm:
 - fix function return breaking reclaim
 - fix build failure on PREEMPT_RT
 - fix bo->resource UAF
 
 dma-buf:
 - include ioctl.h in uapi header
 
 sched:
 - fix kernel doc warning
 
 amdgpu:
 - LUT fixes
 - VCN5 fix
 - Dispclk fix
 - SMU 13.x fix
 - Fix race in VM acquire
 - PSP 15.x fix
 - UserQ fix
 
 amdxdna:
 - fix invalid payload for failed command
 - fix NULL ptr dereference
 - fix major fw version check
 - avoid inconsistent fw state on error
 
 i915/display:
 - Fix for Lenovo T14 G7 display not refreshing
 
 xe:
 - Do not preempt fence signaling CS instructions
 - Some leak and finalization fixes
 - Workaround fix
 
 nouveau:
 - avoid runtime suspend oops when using dp aux
 
 panthor:
 - fix gem_sync argument ordering
 
 solomon:
 - fix incorrect display output
 
 renesas:
 - fix DSI divider programming
 
 ethosu:
 - fix job submit error clean-up refcount
 - fix NPU_OP_ELEMENTWISE validation
 - handle possible underflows in IFM size calcs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmmrODEACgkQDHTzWXnE
 hr6ENA/9E+ABsQPUSr1ZUR6MfS3FdDMfyNi34u9F0XE0vL/6J4DIN2MXHJN0yj4S
 YeR8RIH1uuteexRCVjK8gYrDFZm5GpdJ7tZsPBogKSjfr4P4zvSdcyP0DtubmHN8
 k/Rbdr1m2BnAerviPUlQ09voG4VyM/IlnkOj/aUrAG4xHfNLWhQ/eY21HjvTaMEw
 eNPk75G32AkusybhYGLcfRKUqmZ9Sh/EAIOiqp5Gd61bSbNXFbXqU59AliRSW16D
 dRMjCenQbBtXnf1xzn9xmC95Mr5NhlId3BddY6nsmTvqnse50w1IBw4MbA5703iv
 teNAsRHNWBiyachtolylu63z8fjc3mqRgyB3qoYCIn4pGm6wej3SSuOr+WaKTPMr
 ucKNxvsluiXOIdUrkLYyl1LTU1Bqs2/mFTPzyHFykeB72swHGyCfYyh/RgL+dYIc
 nuOsY1n4JR1JCkg5p4Y8LZq0RXLE6PJfyxmpF9W2r9DVV4T52Yk5fiXEo+HJ9WaJ
 yaa3xBJJLP0z50JuaV7iEIiXQBwj482m0No+jG4+ZTatRm/4oz0cHpmoRkPuN6Og
 vc6UdFUtRysVqSetLfV1oBpLm3zkhumDxtrqb6GHB7n+ahY7mFCLc175wFXoGCqu
 DxCshK5FZchyPbOXaxOWyIVXDCYJtCJnK4vBCHwkJAkxLQCpv/k=
 =yGA4
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2026-03-07' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly fixes pull.

  There is one mm fix in here for a HMM livelock triggered by the xe
  driver tests. Otherwise it's a pretty wide range of fixes across the
  board, ttm UAF regression fix, amdgpu fixes, nouveau doesn't crash my
  laptop anymore fix, and a fair bit of misc.

  Seems about right for rc3.

  mm:
   - mm: Fix a hmm_range_fault() livelock / starvation problem

  pagemap:
   - Revert "drm/pagemap: Disable device-to-device migration"

  ttm:
   - fix function return breaking reclaim
   - fix build failure on PREEMPT_RT
   - fix bo->resource UAF

  dma-buf:
   - include ioctl.h in uapi header

  sched:
   - fix kernel doc warning

  amdgpu:
   - LUT fixes
   - VCN5 fix
   - Dispclk fix
   - SMU 13.x fix
   - Fix race in VM acquire
   - PSP 15.x fix
   - UserQ fix

  amdxdna:
   - fix invalid payload for failed command
   - fix NULL ptr dereference
   - fix major fw version check
   - avoid inconsistent fw state on error

  i915/display:
   - Fix for Lenovo T14 G7 display not refreshing

  xe:
   - Do not preempt fence signaling CS instructions
   - Some leak and finalization fixes
   - Workaround fix

  nouveau:
   - avoid runtime suspend oops when using dp aux

  panthor:
   - fix gem_sync argument ordering

  solomon:
   - fix incorrect display output

  renesas:
   - fix DSI divider programming

  ethosu:
   - fix job submit error clean-up refcount
   - fix NPU_OP_ELEMENTWISE validation
   - handle possible underflows in IFM size calcs"

* tag 'drm-fixes-2026-03-07' of https://gitlab.freedesktop.org/drm/kernel: (38 commits)
  accel: ethosu: Handle possible underflow in IFM size calculations
  accel: ethosu: Fix NPU_OP_ELEMENTWISE validation with scalar
  accel: ethosu: Fix job submit error clean-up refcount underflows
  accel/amdxdna: Split mailbox channel create function
  drm/panthor: Correct the order of arguments passed to gem_sync
  Revert "drm/syncobj: Fix handle <-> fd ioctls with dirty stack"
  drm/ttm: Fix bo resource use-after-free
  nouveau/dpcd: return EBUSY for aux xfer if the device is asleep
  accel/amdxdna: Fix major version check on NPU1 platform
  drm/amdgpu/userq: refcount userqueues to avoid any race conditions
  drm/amdgpu/userq: Consolidate wait ioctl exit path
  drm/amdgpu/psp: Use Indirect access address for GFX to PSP mailbox
  drm/amdgpu: Fix use-after-free race in VM acquire
  drm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.x
  drm/xe: Fix memory leak in xe_vm_madvise_ioctl
  drm/xe/reg_sr: Fix leak on xa_store failure
  drm/xe/xe2_hpg: Correct implementation of Wa_16025250150
  drm/xe/gsc: Fix GSC proxy cleanup on early initialization failure
  Revert "drm/pagemap: Disable device-to-device migration"
  drm/i915/psr: Fix for Panel Replay X granularity DPCD register handling
  ...
2026-03-06 13:29:12 -08:00
Linus Torvalds 3ad66a34cc io_uring-7.0-20260305
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmqPTAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpueIEACHF0uws+uZSEsy9LUyC9ha8+5YN9szIJ3K
 QUGxa9pmCnQG5K50KpxyEYP6buaJDy1smJGgD2obkeGncC4w6xKK2kQTQV1U+C1C
 YA+7B/3HLhz5AWS6GIbRy6VZ599I4evlF8W79dX8BTnF8Y1ddkSuUnKx//q0AoQZ
 hr3foglcFlchy8JuQ2/MpxzfOouvNMdMmeUN4O+t8iXDrmePFYIOgrLcT+ObgC5D
 SXWx2cc3hMJ35hcSzedMWEBFcXnkX9nfh8Hd/+uPRcKsIwS8kCo6z01GoT/BCPRA
 jdrxAfoYSL16HPfq6GU52n6iCaRd+5NK+tt/ECCzTxGL32Hadrr+nxw4O7g3Q96u
 07zeaqHSoTGUchtlqrGjALQLP2yxdACEjxMh3rfdStRv3x3bbbVVDdioVEzPukCr
 EBA+AbqaaG3LIYXwcY+15zx5NrAfeBAP1RjLgoV0s2ch4ghEqvnZGY4NLBDkcQ2R
 97tM9+OdecBrsnlQr5GBoDbwpqc2pDEqSjkYDwoXqvqXs0DrMRq2MQw1Hjjh7Z7G
 FZx1KNTiLB/YQ0sSyMcUKnH+qBA0FxwN/C6dDnRjj4dH5YsoeG/GhsS3B00a+0yE
 S3MKrsf+uN21OYLVPSTEN6qS+02ZvK6E/Aw7/fk2IV60JMeM5KvCccmxa53dKls8
 iyEJ7nVLOg==
 =xyKA
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

 - Fix a typo in the mock_file help text

 - Fix a comment regarding IORING_SETUP_TASKRUN_FLAG in the
   io_uring.h UAPI header

 - Use READ_ONCE() for reading refill queue entries

 - Reject SEND_VECTORIZED for fixed buffer sends, as it isn't
   implemented. Currently this flag is silently ignored

   This is in preparation for making these work, but first we
   need a fixup so that older kernels will correctly reject them

 - Ensure "0" means default for the rx page size

* tag 'io_uring-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/zcrx: use READ_ONCE with user shared RQEs
  io_uring/mock: Fix typo in help text
  io_uring/net: reject SEND_VECTORIZED when unsupported
  io_uring: correct comment for IORING_SETUP_TASKRUN_FLAG
  io_uring/zcrx: don't set rx_page_size when not requested
2026-03-06 08:31:36 -08:00
Dave Airlie 3d3234d5da A return type fix for ttm, a display fix for solomon, several misc fixes
for amdxdna, a DSI clock rate fix for rz-du, a uapi fix for syncobj, a
 possible build failure fix for dma-buf, a doc warning fix for sched, a
 build failure fix for ttm tests, and a crash fix when suspended for
 nouveau.
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaak6GgAKCRAnX84Zoj2+
 dkHpAX91/gbgY5FDu7va/7Ybo3oH/YvZOIQsbOz0sfJsjnszyKT3Wh4MGM8QphlI
 93YHoi8Bf2M++H1mQgFrm97kjISmjgZYufM+6Cy92oqMO/SKCxiHTCBRnTBxas1B
 CXek10L1Pg==
 =jxDJ
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2026-03-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

A return type fix for ttm, a display fix for solomon, several misc fixes
for amdxdna, a DSI clock rate fix for rz-du, a uapi fix for syncobj, a
possible build failure fix for dma-buf, a doc warning fix for sched, a
build failure fix for ttm tests, and a crash fix when suspended for
nouveau.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20260305-ludicrous-quirky-raven-7cdafd@houat
2026-03-06 19:40:00 +10:00
Ricardo Robaina f3e334fb7f audit: fix coding style issues
Fix various coding style issues across the audit subsystem flagged
by checkpatch.pl script to adhere to kernel coding standards.

Specific changes include:
- kernel/auditfilter.c: Move the open brace '{' to the previous line
  for the audit_ops array declaration.
- lib/audit.c: Add a required space before the open parenthesis '('.
- include/uapi/linux/audit.h: Enclose the complex macro value for
  AUDIT_UID_UNSET in parentheses.

Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-03-05 22:16:08 -05:00
Isaac J. Manjarres a116bac871 dma-buf: Include ioctl.h in UAPI header
include/uapi/linux/dma-buf.h uses several macros from ioctl.h to define
its ioctl commands. However, it does not include ioctl.h itself. So,
if userspace source code tries to include the dma-buf.h file without
including ioctl.h, it can result in build failures.

Therefore, include ioctl.h in the dma-buf UAPI header.

Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20260303002309.1401849-1-isaacmanjarres@google.com
2026-03-03 08:55:39 +01:00
Ricardo Robaina f6c2996709 audit: fix whitespace alignment in include/uapi/linux/audit.h
Fixed minor indentation inconsistencies in the audit macros
to align with standard kernel coding style using 8-character
hard tabs.

Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
[PM: fixed a space before tab issue in the patch]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-03-02 16:31:05 -05:00
Linus Torvalds 6170625149 Miscellaneous fixes:
- Fix zero_vruntime tracking when there's a single task running
 
  - Fix slice protection logic
 
  - Fix the ->vprot logic for reniced tasks
 
  - Fix lag clamping in mixed slice workloads
 
  - Fix objtool uaccess warning (and bug) in the
    !CONFIG_RSEQ_SLICE_EXTENSION case caused by unexpected
    un-inlining, which triggers with older compilers
 
  - Fix a comment in the rseq registration rseq_size bound check code
 
  - Fix a legacy RSEQ ABI quirk that handled 32-byte area sizes
    differently, which special size we now reached naturally and
    want to avoid. The visible ugliness of the new reserved field
    will be avoided the next time the RSEQ area is extended.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmkAl0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hdsg/+IdQpmtEXsugE1FqEuuptm0ld6hcFI9WC
 mlEXhid5Gq3a3KMBv0CLd73o+k8Ju/BDEdbfLMzY8A9h8OxfnuUL1T6Jt4q7dF1h
 76ja1R+i+GNFcXWmSG8z6FUns4bRBJeNWFs3dzCFE9N2qOCCj1xBr/9BqgKvNVfZ
 cbcaiMvmi3z/vPUmT8hdMdEcA0Zo2gVcKDmny4Tca9sigyLZD8FqtW1FhqL1HX8H
 Cx8fZ2lkD2z6gKtOAbC3QuWVmP88tvZldaMsGHTAQIa14PP5h2xhyuLxBF1Zjnwy
 aWl4iYr6ILu3LRi54CmQOiESdEf3Srdbl8JxDcvU9vh8ecqXvDGPUB2xCszlPvOx
 R+scskNgNyd1WtUF2VYFLTNkj0B7Xe6eTYfIu2d5r8GrRt0YjRzsK/JQallAkV6V
 KORDm4/Xyl5Ss6tNtfZP7lpHD2qykscRGxgr0HjjJCyjA1ZNtGc1A+JKZ8D8q9Nq
 rxEbaa65KfAtYJ4i5j9goFPQwNeHXm/emToVzEfyKwZHs3ns0LwffDGSFFOYSm/p
 FVVmi9iSoxRvRFHBflvBIwFaCnIyBLTJZlB/Bp8MVaFnv+6OzdE/nfcKNaYqcVaT
 mzCpY2DFTx5KISmJR7DAWsPntoRV6WPcxVApWicTaT5G3C2TLvvTAEq8g2WIYDFB
 j6oNyEkX/Xw=
 =Nxqx
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:

 - Fix zero_vruntime tracking when there's a single task running

 - Fix slice protection logic

 - Fix the ->vprot logic for reniced tasks

 - Fix lag clamping in mixed slice workloads

 - Fix objtool uaccess warning (and bug) in the
   !CONFIG_RSEQ_SLICE_EXTENSION case caused by unexpected un-inlining,
   which triggers with older compilers

 - Fix a comment in the rseq registration rseq_size bound check code

 - Fix a legacy RSEQ ABI quirk that handled 32-byte area sizes
   differently, which special size we now reached naturally and want to
   avoid. The visible ugliness of the new reserved field will be avoided
   the next time the RSEQ area is extended.

* tag 'sched-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: slice ext: Ensure rseq feature size differs from original rseq size
  rseq: Clarify rseq registration rseq_size bound check comment
  sched/core: Fix wakeup_preempt's next_class tracking
  rseq: Mark rseq_arm_slice_extension_timer() __always_inline
  sched/fair: Fix lag clamp
  sched/eevdf: Update se->vprot in reweight_entity()
  sched/fair: Only set slice protection at pick time
  sched/fair: Fix zero_vruntime tracking
2026-03-01 11:09:24 -08:00
Jens Axboe 0ed2e8bf61 io_uring: correct comment for IORING_SETUP_TASKRUN_FLAG
Sync with a recent liburing fix, which corrects the comment explaining
when the IORING_SETUP_TASKRUN_FLAG setup flag is valid to use. May be
use with COOP_TASKRUN or DEFER_TASKRUN, not useful without either of
this task_work mechanisms being used.

Link: https://github.com/axboe/liburing/pull/1543
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-28 04:56:20 -07:00
Bjorn Helgaas 39195990e4 PCI: Correct PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value
fb82437fdd ("PCI: Change capability register offsets to hex") incorrectly
converted the PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value from decimal 52 to hex
0x32:

  -#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52      /* v2 endpoints with link end here */
  +#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 0x32    /* end of v2 EPs w/ link */

This broke PCI capabilities in a VMM because subsequent ones weren't
DWORD-aligned.

Change PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 to the correct value of 0x34.

fb82437fdd was from Baruch Siach <baruch@tkos.co.il>, but this was not
Baruch's fault; it's a mistake I made when applying the patch.

Fixes: fb82437fdd ("PCI: Change capability register offsets to hex")
Reported-by: David Woodhouse <dwmw2@infradead.org>
Closes: https://lore.kernel.org/all/3ae392a0158e9d9ab09a1d42150429dd8ca42791.camel@infradead.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2026-02-27 10:24:25 -06:00
Mathieu Desnoyers 3b68df9781 rseq: slice ext: Ensure rseq feature size differs from original rseq size
Before rseq became extensible, its original size was 32 bytes even
though the active rseq area was only 20 bytes. This had the following
impact in terms of userspace ecosystem evolution:

* The GNU libc between 2.35 and 2.39 expose a __rseq_size symbol set
  to 32, even though the size of the active rseq area is really 20.
* The GNU libc 2.40 changes this __rseq_size to 20, thus making it
  express the active rseq area.
* Starting from glibc 2.41, __rseq_size corresponds to the
  AT_RSEQ_FEATURE_SIZE from getauxval(3).

This means that users of __rseq_size can always expect it to
correspond to the active rseq area, except for the value 32, for
which the active rseq area is 20 bytes.

Exposing a 32 bytes feature size would make life needlessly painful
for userspace. Therefore, add a reserved field at the end of the
rseq area to bump the feature size to 33 bytes. This reserved field
is expected to be replaced with whatever field will come next,
expecting that this field will be larger than 1 byte.

The effect of this change is to increase the size from 32 to 64 bytes
before we actually have fields using that memory.

Clarify the allocation size and alignment requirements in the struct
rseq uapi comment.

Change the value returned by getauxval(AT_RSEQ_ALIGN) to return the
value of the active rseq area size rounded up to next power of 2, which
guarantees that the rseq structure will always be aligned on the nearest
power of two large enough to contain it, even as it grows. Change the
alignment check in the rseq registration accordingly.

This will minimize the amount of ABI corner-cases we need to document
and require userspace to play games with. The rule stays simple when
__rseq_size != 32:

  #define rseq_field_available(field)	(__rseq_size >= offsetofend(struct rseq_abi, field))

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260220200642.1317826-3-mathieu.desnoyers@efficios.com
2026-02-23 11:19:19 +01:00
Linus Torvalds d31558c077 hyperv-next for v7.0
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmmWuQwTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXnnHB/41Jji+y8FHe2SqpQhUOqHb6NDEr3GX
 YpAybhz2IsBHVhbCQn789UiIcSr0UDR7wnVLAmXe+5eY/jRwNggIO3tFqLYn92pK
 KSTNafgNbLxh3iKBxRsUy0b3JutjD2LytkpFj2KVbBsZfmRxCZmKIV/4V18rV+fA
 uemvoqLwU7emEWkhZ24suHMHPVpv6xKs9O6gOrQ4+zXR0g//eMLDqb17uj8h+8sM
 ZsPsMYeuOihXlvGeBRjbnWYjA1ODWGDvwR9VT+VU4+HWht/KSr15EGeXZdV2eZUt
 e/8swbqOS94a2ZjOgStzVkcPqAF88t9zZ+gvYElTDzLlHjqbrZdpeDDt
 =A7tT
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V updates from Wei Liu:

 - Debugfs support for MSHV statistics (Nuno Das Neves)

 - Support for the integrated scheduler (Stanislav Kinsburskii)

 - Various fixes for MSHV memory management and hypervisor status
   handling (Stanislav Kinsburskii)

 - Expose more capabilities and flags for MSHV partition management
   (Anatol Belski, Muminul Islam, Magnus Kulke)

 - Miscellaneous fixes to improve code quality and stability (Carlos
   López, Ethan Nelson-Moore, Li RongQing, Michael Kelley, Mukesh
   Rathor, Purna Pavan Chandra Aekkaladevi, Stanislav Kinsburskii, Uros
   Bizjak)

 - PREEMPT_RT fixes for vmbus interrupts (Jan Kiszka)

* tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (34 commits)
  mshv: Handle insufficient root memory hypervisor statuses
  mshv: Handle insufficient contiguous memory hypervisor status
  mshv: Introduce hv_deposit_memory helper functions
  mshv: Introduce hv_result_needs_memory() helper function
  mshv: Add SMT_ENABLED_GUEST partition creation flag
  mshv: Add nested virtualization creation flag
  Drivers: hv: vmbus: Simplify allocation of vmbus_evt
  mshv: expose the scrub partition hypercall
  mshv: Add support for integrated scheduler
  mshv: Use try_cmpxchg() instead of cmpxchg()
  x86/hyperv: Fix error pointer dereference
  x86/hyperv: Reserve 3 interrupt vectors used exclusively by MSHV
  Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT
  x86/hyperv: Remove ASM_CALL_CONSTRAINT with VMMCALL insn
  x86/hyperv: Use savesegment() instead of inline asm() to save segment registers
  mshv: fix SRCU protection in irqfd resampler ack handler
  mshv: make field names descriptive in a header struct
  x86/hyperv: Update comment in hyperv_cleanup()
  mshv: clear eventfd counter on irqfd shutdown
  x86/hyperv: Use memremap()/memunmap() instead of ioremap_cache()/iounmap()
  ...
2026-02-20 08:48:31 -08:00
Linus Torvalds 8bf22c33e7 Including fixes from Netfilter.
Current release - new code bugs:
 
  - net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT
 
  - eth: mlx5e: XSK, Fix unintended ICOSQ change
 
  - phy_port: correctly recompute the port's linkmodes
 
  - vsock: prevent child netns mode switch from local to global
 
  - couple of kconfig fixes for new symbols
 
 Previous releases - regressions:
 
  - nfc: nci: fix false-positive parameter validation for packet data
 
  - net: do not delay zero-copy skbs in skb_attempt_defer_free()
 
 Previous releases - always broken:
 
  - mctp: ensure our nlmsg responses to user space are zero-initialised
 
  - ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()
 
  - fixes for ICMP rate limiting
 
 Misc:
 
  - intel: fix PCI device ID conflict between i40e and ipw2200
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmmXUh8ACgkQMUZtbf5S
 IrufYA//ZVj+4gvegqKwKZYXNBndVW00GGTYqaILbaenK1olUVUelVB91eV2Klc/
 dXCeKG/MgEPuT89IjkPzVr2Yg4x6uhjcQL1rsahORn+GuQfSI/P8y7ysDOPnHVeM
 Rtsg1m8z3EizJcHPeAJe7nEqFzfvZ2m+FCEGe++z8BYaUZUVApytgpIWOHO/aB+p
 t13bCNzd05XxPphMl610T00Fncj2jCVDHILMgTB5rmFmkeJuQwNrRGXQSoQame46
 +g+yCZjT0eVTrBaH1EUssWfrOT3VJj3BEee6gSp7k9mxMkbW18i8shBgmxS+EHjk
 u19wwBzSrHK+JY1UExim+1E/rZisQVmEE1Gs0ALedxAu9zC/Julzfa2/+BFsc0j7
 QTXd4jukG3aTPIX8v3TV2Igu0j+bAT4WdpzvnsXXBMVKy7wFYMd1+aSOLyFH2W9L
 qRbg50oUATcsz77bZt6YUTJEgua4HXNYGtn15FMZOR7HJVR2L44Q5TK5mQxGp5iM
 GabeKMzg6bsjE98STM3nbWks3pIb9ptIk++i0913eSqKgn84bDPtp3Gabfgle2SJ
 8gjKS61K8rDt5x8StXVod7oGQ4asL8RJyOtE/avgbWUu9BNH8/oKqsE6TQrpXauv
 1ndiyim/mPe4fBCxkVAi2+uq5/ph9z8XyleESz9VYwyL3Rl4nsg=
 =qSCj
 -----END PGP SIGNATURE-----

Merge tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from Netfilter.

  Current release - new code bugs:

   - net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT

   - eth: mlx5e: XSK, Fix unintended ICOSQ change

   - phy_port: correctly recompute the port's linkmodes

   - vsock: prevent child netns mode switch from local to global

   - couple of kconfig fixes for new symbols

  Previous releases - regressions:

   - nfc: nci: fix false-positive parameter validation for packet data

   - net: do not delay zero-copy skbs in skb_attempt_defer_free()

  Previous releases - always broken:

   - mctp: ensure our nlmsg responses to user space are zero-initialised

   - ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()

   - fixes for ICMP rate limiting

  Misc:

   - intel: fix PCI device ID conflict between i40e and ipw2200"

* tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  net: nfc: nci: Fix parameter validation for packet data
  net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
  net/mlx5e: Fix deadlocks between devlink and netdev instance locks
  net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
  net/mlx5: Fix misidentification of write combining CQE during poll loop
  net/mlx5e: Fix misidentification of ASO CQE during poll loop
  net/mlx5: Fix multiport device check over light SFs
  bonding: alb: fix UAF in rlb_arp_recv during bond up/down
  bnge: fix reserving resources from FW
  eth: fbnic: Advertise supported XDP features.
  rds: tcp: fix uninit-value in __inet_bind
  net/rds: Fix NULL pointer dereference in rds_tcp_accept_one
  octeontx2-af: Fix default entries mcam entry action
  net/mlx5e: XSK, Fix unintended ICOSQ change
  ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero
  ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero
  ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
  inet: move icmp_global_{credit,stamp} to a separate cache line
  icmp: prevent possible overflow in icmp_global_allow()
  selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
  ...
2026-02-19 10:39:08 -08:00
Anatol Belski 8927a108a7 mshv: Add SMT_ENABLED_GUEST partition creation flag
Add support for HV_PARTITION_CREATION_FLAG_SMT_ENABLED_GUEST
to allow userspace VMMs to enable SMT for guest partitions.

Expose this via new MSHV_PT_BIT_SMT_ENABLED_GUEST flag in the UAPI.

Without this flag, the hypervisor schedules guest VPs incorrectly,
causing SMT unusable.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18 23:54:37 +00:00
Muminul Islam a284dbc96a mshv: Add nested virtualization creation flag
Introduce HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE to
indicate support for nested virtualization during partition creation.

This enables clearer configuration and capability checks for nested
virtualization scenarios.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18 23:53:22 +00:00
Linus Torvalds e81dd54f62 dmaengine updates for v7.0
Core:
   - Add Frank Li as susbstem reviewer to help with reviews
 
  New Support:
   - Mediatek support for Dimensity 6300 and 9200 controller
   - Qualcomm Kaanapali and Glymur GPI DMA engine support
   - Synopsis DW AXI Agilex5 support
   - Renesas RZ/V2N SoC support
   - Atmel microchip lan9691-dma support
   - Tegra ADMA tegra264 support
 
  Updates:
   - sg_nents_for_dma() helper use in subsystem
   - pm_runtime_mark_last_busy() redundant call update for subsystem
   - Residue support for xilinx AXIDMA driver
   - Intel Max SGL Size Support and capabilities for DSA3.0
   - AXI dma larger than 32bits address support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmmUkGgACgkQfBQHDyUj
 g0d6uRAAnVfM6GxVt4PuRd1t+i6qeNhqZrq+8001YtFgOJp0hxPX7k9PP1F42kjp
 1zrICvvdqH8gw8+8AaT2JpIZp4vENmjGdnkCo+HU6FgGPCUmkkpehPk58Y2K3r0a
 LbHNjjj7V7SDGs9SzT2It+d7KfKnv1adushBhuO7xv524hwSuetw1q8CnLPoxaPx
 KNWToovCp5tlHCucWQAdmd4bPsUv1mFMvlJxK4a26WKeL7lU6BeDS06rLTNq5PNZ
 51sYdSvyBOSCUcFGToeebJFsKCQukryZTXTtsKMsmLvmHaTMahu2TwNzQ+PRSBSr
 kZ9GpS51tet67txGzGzJRFGDY9quKFrajQ60Om6dr9aYm2xW7gEZFa0NKTlz9q7w
 kbwsPgd87sYI8MDWpinAuvwUS2OXnihjqdYVp0QouJd8eRiH1pWagtjubGRVYekC
 eEZjyxpz8ZD+LT2G3I0uy2FnqCkcEjSfchBCtuPcxSSSkHRXVf4tgUAI833YIdek
 gtd7h+/jepcVOcVAeaMVvVYnNIhVkHQkQC1/HmZCqNoyoY/oK8JcUF3UskzP7BPW
 gvEJhtFD0RBInu5UM0rS31zF+8Q9EMbDXKY2PiCCyxtsjAe5yWyeNNsUx9DN8ixv
 XyclsF7javUOZSoxzH3mCLy+x51p84Mq2KGQGL9H7PK/hWDMmoo=
 =nrcD
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine updates from Vinod Koul:
 "Core:
   - Add Frank Li as susbstem reviewer to help with reviews

  New Support:
   - Mediatek support for Dimensity 6300 and 9200 controller
   - Qualcomm Kaanapali and Glymur GPI DMA engine
   - Synopsis DW AXI Agilex5
   - Renesas RZ/V2N SoC
   - Atmel microchip lan9691-dma
   - Tegra ADMA tegra264

  Updates:
   - sg_nents_for_dma() helper use in subsystem
   - pm_runtime_mark_last_busy() redundant call update for subsystem
   - Residue support for xilinx AXIDMA driver
   - Intel Max SGL Size Support and capabilities for DSA3.0
   - AXI dma larger than 32bits address support"

* tag 'dmaengine-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (64 commits)
  dmaengine: add Frank Li as reviewer
  dt-bindings: dma: qcom,gpi: Update max interrupts lines to 16
  dmaengine: fsl-edma: don't explicitly disable clocks in .remove()
  dmaengine: xilinx: xdma: use sg_nents_for_dma() helper
  dmaengine: sh: use sg_nents_for_dma() helper
  dmaengine: sa11x0: use sg_nents_for_dma() helper
  dmaengine: qcom: bam_dma: use sg_nents_for_dma() helper
  dmaengine: qcom: adm: use sg_nents_for_dma() helper
  dmaengine: pxa-dma: use sg_nents_for_dma() helper
  dmaengine: lgm: use sg_nents_for_dma() helper
  dmaengine: k3dma: use sg_nents_for_dma() helper
  dmaengine: dw-axi-dmac: use sg_nents_for_dma() helper
  dmaengine: bcm2835-dma: use sg_nents_for_dma() helper
  dmaengine: axi-dmac: use sg_nents_for_dma() helper
  dmaengine: altera-msgdma: use sg_nents_for_dma() helper
  scatterlist: introduce sg_nents_for_dma() helper
  dmaengine: idxd: Add Max SGL Size Support for DSA3.0
  dmaengine: idxd: Expose DSA3.0 capabilities through sysfs
  dmaengine: sh: rz-dmac: Make channel irq local
  dmaengine: pl08x: Fix comment stating the difference between PL080 and PL081
  ...
2026-02-17 11:47:17 -08:00