Commit Graph

48222 Commits (eb7fd7aa35bfcc1e1fda4ecc42ccfcb526cdc780)

Author SHA1 Message Date
Linus Torvalds 90b83efa67 bpf-next-6.16
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmg3NqgACgkQ6rmadz2v
 bTpNUQ/8DPeYtn3nskpsP2OwFy6O3hhfCe6gjOAmUVSk000xbG+AcI/h1DnGZWgk
 xlVcEs93ekzUzHd7k1+RJ2c5yDLXieLJAtb66rbFU1enkxs2cWlcWSKE6K/gaoh3
 G1BCARVlKwtrJhrVrsXtYP/eGZxKRSUZFK7xhtCk7lp7sRI3xkTLE+FJBcDkTJ6W
 HwF14i3zO+BkqNGdFwwlASCCqRItSNBBiM3KjW1DbETOTfAKlvCTrcgdUiODqxhF
 PNnULW+xmICABDFlKfDMlUAGNlSHKjiI3+g31LdblA5eyEhIqiCRgBGFYoCnsluk
 qUauRSie61KqC7fxN3qVpC3bXJfD1td7uIvoqSkDLtTv8a5+HAoiohzi1qBzCayl
 LAGkBYewAfDtdDDjNY38JLH2RCdyY6zG9DhqghPHdPlM7zj7L5zZgj34igEwesMM
 mfj9TuFFF99yfX5UUeSxKpDGR1eO4Ew0p7tg8CRs8Fqh6AIQSmboREZrsncVRCTS
 4SDHSI4KcO4LO2pEKzy+X4dewganN7aESnQG34iG0liyvDDwJOgUnDWLRwPLas7k
 3b/zIfBLxOJpA5R+0hhAMtjMA4NgyKJf4yFZwEieuasQjvzwTApi24YhZ/b3HSEB
 2Dp8kHEEbwezv0OFFz/fJ88dNQnrDmtJ+QByN/liA8kj4Yuh2+Q=
 =j3t8
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:

 - Fix and improve BTF deduplication of identical BTF types (Alan
   Maguire and Andrii Nakryiko)

 - Support up to 12 arguments in BPF trampoline on arm64 (Xu Kuohai and
   Alexis Lothoré)

 - Support load-acquire and store-release instructions in BPF JIT on
   riscv64 (Andrea Parri)

 - Fix uninitialized values in BPF_{CORE,PROBE}_READ macros (Anton
   Protopopov)

 - Streamline allowed helpers across program types (Feng Yang)

 - Support atomic update for hashtab of BPF maps (Hou Tao)

 - Implement json output for BPF helpers (Ihor Solodrai)

 - Several s390 JIT fixes (Ilya Leoshkevich)

 - Various sockmap fixes (Jiayuan Chen)

 - Support mmap of vmlinux BTF data (Lorenz Bauer)

 - Support BPF rbtree traversal and list peeking (Martin KaFai Lau)

 - Tests for sockmap/sockhash redirection (Michal Luczaj)

 - Introduce kfuncs for memory reads into dynptrs (Mykyta Yatsenko)

 - Add support for dma-buf iterators in BPF (T.J. Mercier)

 - The verifier support for __bpf_trap() (Yonghong Song)

* tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (135 commits)
  bpf, arm64: Remove unused-but-set function and variable.
  selftests/bpf: Add tests with stack ptr register in conditional jmp
  bpf: Do not include stack ptr register in precision backtracking bookkeeping
  selftests/bpf: enable many-args tests for arm64
  bpf, arm64: Support up to 12 function arguments
  bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem()
  bpf: Avoid __bpf_prog_ret0_warn when jit fails
  bpftool: Add support for custom BTF path in prog load/loadall
  selftests/bpf: Add unit tests with __bpf_trap() kfunc
  bpf: Warn with __bpf_trap() kfunc maybe due to uninitialized variable
  bpf: Remove special_kfunc_set from verifier
  selftests/bpf: Add test for open coded dmabuf_iter
  selftests/bpf: Add test for dmabuf_iter
  bpf: Add open coded dmabuf iterator
  bpf: Add dmabuf iterator
  dma-buf: Rename debugfs symbols
  bpf: Fix error return value in bpf_copy_from_user_dynptr
  libbpf: Use mmap to parse vmlinux BTF from sysfs
  selftests: bpf: Add a test for mmapable vmlinux BTF
  btf: Allow mmap of vmlinux btf
  ...
2025-05-28 15:52:42 -07:00
Linus Torvalds 1b98f357da Networking changes for 6.16.
Core
 ----
 
  - Implement the Device Memory TCP transmit path, allowing zero-copy
    data transmission on top of TCP from e.g. GPU memory to the wire.
 
  - Move all the IPv6 routing tables management outside the RTNL scope,
    under its own lock and RCU. The route control path is now 3x times
    faster.
 
  - Convert queue related netlink ops to instance lock, reducing
    again the scope of the RTNL lock. This improves the control plane
    scalability.
 
  - Refactor the software crc32c implementation, removing unneeded
    abstraction layers and improving significantly the related
    micro-benchmarks.
 
  - Optimize the GRO engine for UDP-tunneled traffic, for a 10%
    performance improvement in related stream tests.
 
  - Cover more per-CPU storage with local nested BH locking; this is a
    prep work to remove the current per-CPU lock in local_bh_disable()
    on PREMPT_RT.
 
  - Introduce and use nlmsg_payload helper, combining buffer bounds
    verification with accessing payload carried by netlink messages.
 
 Netfilter
 ---------
 
  - Rewrite the procfs conntrack table implementation, improving
    considerably the dump performance. A lot of user-space tools
    still use this interface.
 
  - Implement support for wildcard netdevice in netdev basechain
    and flowtables.
 
  - Integrate conntrack information into nft trace infrastructure.
 
  - Export set count and backend name to userspace, for better
    introspection.
 
 BPF
 ---
 
  - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops
    programs and can be controlled in similar way to traditional qdiscs
    using the "tc qdisc" command.
 
  - Refactor the UDP socket iterator, addressing long standing issues
    WRT duplicate hits or missed sockets.
 
 Protocols
 ---------
 
  - Improve TCP receive buffer auto-tuning and increase the default
    upper bound for the receive buffer; overall this improves the single
    flow maximum thoughput on 200Gbs link by over 60%.
 
  - Add AFS GSSAPI security class to AF_RXRPC; it provides transport
    security for connections to the AFS fileserver and VL server.
 
  - Improve TCP multipath routing, so that the sources address always
    matches the nexthop device.
 
  - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS,
    and thus preventing DoS caused by passing around problematic FDs.
 
  - Retire DCCP socket. DCCP only receives updates for bugs, and major
    distros disable it by default. Its removal allows for better
    organisation of TCP fields to reduce the number of cache lines hit
    in the fast path.
 
  - Extend TCP drop-reason support to cover PAWS checks.
 
 Driver API
 ----------
 
  - Reorganize PTP ioctl flag support to require an explicit opt-in for
    the drivers, avoiding the problem of drivers not rejecting new
    unsupported flags.
 
  - Converted several device drivers to timestamping APIs.
 
  - Introduce per-PHY ethtool dump helpers, improving the support for
    dump operations targeting PHYs.
 
 Tests and tooling
 -----------------
 
  - Add support for classic netlink in user space C codegen, so that
    ynl-c can now read, create and modify links, routes addresses and
    qdisc layer configuration.
 
  - Add ynl sub-types for binary attributes, allowing ynl-c to output
    known struct instead of raw binary data, clarifying the classic
    netlink output.
 
  - Extend MPTCP selftests to improve the code-coverage.
 
  - Add tests for XDP tail adjustment in AF_XDP.
 
 New hardware / drivers
 ----------------------
 
  - OpenVPN virtual driver: offload OpenVPN data channels processing
    to the kernel-space, increasing the data transfer throughput WRT
    the user-space implementation.
 
  - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC.
 
  - Broadcom asp-v3.0 ethernet driver.
 
  - AMD Renoir ethernet device.
 
  - ReakTek MT9888 2.5G ethernet PHY driver.
 
  - Aeonsemi 10G C45 PHYs driver.
 
 Drivers
 -------
 
  - Ethernet high-speed NICs:
    - nVidia/Mellanox (mlx5):
      - refactor the stearing table handling to reduce significantly
        the amount of memory used
      - add support for complex matches in H/W flow steering
      - improve flow streeing error handling
      - convert to netdev instance locking
    - Intel (100G, ice, igb, ixgbe, idpf):
      - ice: add switchdev support for LLDP traffic over VF
      - ixgbe: add firmware manipulation and regions devlink support
      - igb: introduce support for frame transmission premption
      - igb: adds persistent NAPI configuration
      - idpf: introduce RDMA support
      - idpf: add initial PTP support
    - Meta (fbnic):
      - extend hardware stats coverage
      - add devlink dev flash support
    - Broadcom (bnxt):
      - add support for RX-side device memory TCP
    - Wangxun (txgbe):
      - implement support for udp tunnel offload
      - complete PTP and SRIOV support for AML 25G/10G devices
 
  - Ethernet NICs embedded and virtual:
    - Google (gve):
      - add device memory TCP TX support
    - Amazon (ena):
      - support persistent per-NAPI config
    - Airoha:
      - add H/W support for L2 traffic offload
      - add per flow stats for flow offloading
    - RealTek (rtl8211): add support for WoL magic packet
    - Synopsys (stmmac):
      - dwmac-socfpga 1000BaseX support
      - add Loongson-2K3000 support
      - introduce support for hardware-accelerated VLAN stripping
    - Broadcom (bcmgenet):
      - expose more H/W stats
    - Freescale (enetc, dpaa2-eth):
      - enetc: add MAC filter, VLAN filter RSS and loopback support
      - dpaa2-eth: convert to H/W timestamping APIs
    - vxlan: convert FDB table to rhashtable, for better scalabilty
    - veth: apply qdisc backpressure on full ring to reduce TX drops
 
  - Ethernet switches:
    - Microchip (kzZ88x3): add ETS scheduler support
 
  - Ethernet PHYs:
    - RealTek (rtl8211):
      - add support for WoL magic packet
      - add support for PHY LEDs
 
  - CAN:
    - Adds RZ/G3E CANFD support to the rcar_canfd driver.
    - Preparatory work for CAN-XL support.
    - Add self-tests framework with support for CAN physical interfaces.
 
  - WiFi:
    - mac80211:
      - scan improvements with multi-link operation (MLO)
    - Qualcomm (ath12k):
      - enable AHB support for IPQ5332
      - add monitor interface support to QCN9274
      - add multi-link operation support to WCN7850
      - add 802.11d scan offload support to WCN7850
      - monitor mode for WCN7850, better 6 GHz regulatory
    - Qualcomm (ath11k):
      - restore hibernation support
    - MediaTek (mt76):
      - WiFi-7 improvements
      - implement support for mt7990
    - Intel (iwlwifi):
      - enhanced multi-link single-radio (EMLSR) support on 5 GHz links
      - rework device configuration
    - RealTek (rtw88):
      - improve throughput for RTL8814AU
    - RealTek (rtw89):
      - add multi-link operation support
      - STA/P2P concurrency improvements
      - support different SAR configs by antenna
 
  - Bluetooth:
    - introduce HCI Driver protocol
    - btintel_pcie: do not generate coredump for diagnostic events
    - btusb: add HCI Drv commands for configuring altsetting
    - btusb: add RTL8851BE device 0x0bda:0xb850
    - btusb: add new VID/PID 13d3/3584 for MT7922
    - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925
    - btnxpuart: implement host-wakeup feature
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmg3D64SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkcIsQAK2eEc+BxQer975wzvtMg6gF9eoex4a+
 rZ7jxfDzDtNvTauoQsrpehDZp0FnySaVGCU36lHGB2OvDnhCpPc5hXzKDWQpOuqQ
 SHrGG3/6FTbdTG/HfHUcbNyrUzIf53SADSObiQ3qg4gyEQ3sCpcOKtVtMcU8rvsY
 /HqMnsJWFaROUMjMtCcnUSgjmeY9kBvha3sTXUqgeRugEOCvZD7z4rpqFIcQqHw7
 e2Fi8dwIXEYNxqPp6MRq2qdyUTewCRruE8ZIMAFuhtfYeMElUZMPlqlMENX3AzTQ
 cr0EgwcFOUxRA7oZRxhoBNBsVXavtSpQr4ZDoWplxP4aQ37n5tc1E9Q72axpB/Og
 FbJRl6GvWYnCd8071BczgmfHlKaTAigPvt2Z4r6JjM5I/Bij/IZ3k+On1OTuOAj/
 EqfFkdZ0a5cfKrwUMP+oSGtSAywkMVUtnIKJlZeRbjSj2432sCfe2jVAlS8ELM43
 3LUgXYrAKtA87g171LlsRu5EEpI5QmqPb+i5LpPlEXe2TJEgPisyfecJ3NafF/2+
 j575lm+TFNm9NTNhGGjDPEvw0djI5wSGGMe9J4gC74eWi6s5t6C4cuUf84TKWdwR
 x+9H0IB7rfFncAwXHJuUUtzd+fPHaYzs5dDGbSgMQOXr1cr1wlubCK8mQ1r/Wt/a
 3GjFIOQKW2Q5
 =t/Tz
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "Core:

   - Implement the Device Memory TCP transmit path, allowing zero-copy
     data transmission on top of TCP from e.g. GPU memory to the wire.

   - Move all the IPv6 routing tables management outside the RTNL scope,
     under its own lock and RCU. The route control path is now 3x times
     faster.

   - Convert queue related netlink ops to instance lock, reducing again
     the scope of the RTNL lock. This improves the control plane
     scalability.

   - Refactor the software crc32c implementation, removing unneeded
     abstraction layers and improving significantly the related
     micro-benchmarks.

   - Optimize the GRO engine for UDP-tunneled traffic, for a 10%
     performance improvement in related stream tests.

   - Cover more per-CPU storage with local nested BH locking; this is a
     prep work to remove the current per-CPU lock in local_bh_disable()
     on PREMPT_RT.

   - Introduce and use nlmsg_payload helper, combining buffer bounds
     verification with accessing payload carried by netlink messages.

  Netfilter:

   - Rewrite the procfs conntrack table implementation, improving
     considerably the dump performance. A lot of user-space tools still
     use this interface.

   - Implement support for wildcard netdevice in netdev basechain and
     flowtables.

   - Integrate conntrack information into nft trace infrastructure.

   - Export set count and backend name to userspace, for better
     introspection.

  BPF:

   - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops
     programs and can be controlled in similar way to traditional qdiscs
     using the "tc qdisc" command.

   - Refactor the UDP socket iterator, addressing long standing issues
     WRT duplicate hits or missed sockets.

  Protocols:

   - Improve TCP receive buffer auto-tuning and increase the default
     upper bound for the receive buffer; overall this improves the
     single flow maximum thoughput on 200Gbs link by over 60%.

   - Add AFS GSSAPI security class to AF_RXRPC; it provides transport
     security for connections to the AFS fileserver and VL server.

   - Improve TCP multipath routing, so that the sources address always
     matches the nexthop device.

   - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS,
     and thus preventing DoS caused by passing around problematic FDs.

   - Retire DCCP socket. DCCP only receives updates for bugs, and major
     distros disable it by default. Its removal allows for better
     organisation of TCP fields to reduce the number of cache lines hit
     in the fast path.

   - Extend TCP drop-reason support to cover PAWS checks.

  Driver API:

   - Reorganize PTP ioctl flag support to require an explicit opt-in for
     the drivers, avoiding the problem of drivers not rejecting new
     unsupported flags.

   - Converted several device drivers to timestamping APIs.

   - Introduce per-PHY ethtool dump helpers, improving the support for
     dump operations targeting PHYs.

  Tests and tooling:

   - Add support for classic netlink in user space C codegen, so that
     ynl-c can now read, create and modify links, routes addresses and
     qdisc layer configuration.

   - Add ynl sub-types for binary attributes, allowing ynl-c to output
     known struct instead of raw binary data, clarifying the classic
     netlink output.

   - Extend MPTCP selftests to improve the code-coverage.

   - Add tests for XDP tail adjustment in AF_XDP.

  New hardware / drivers:

   - OpenVPN virtual driver: offload OpenVPN data channels processing to
     the kernel-space, increasing the data transfer throughput WRT the
     user-space implementation.

   - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC.

   - Broadcom asp-v3.0 ethernet driver.

   - AMD Renoir ethernet device.

   - ReakTek MT9888 2.5G ethernet PHY driver.

   - Aeonsemi 10G C45 PHYs driver.

  Drivers:

   - Ethernet high-speed NICs:
       - nVidia/Mellanox (mlx5):
           - refactor the steering table handling to significantly
             reduce the amount of memory used
           - add support for complex matches in H/W flow steering
           - improve flow streeing error handling
           - convert to netdev instance locking
       - Intel (100G, ice, igb, ixgbe, idpf):
           - ice: add switchdev support for LLDP traffic over VF
           - ixgbe: add firmware manipulation and regions devlink support
           - igb: introduce support for frame transmission premption
           - igb: adds persistent NAPI configuration
           - idpf: introduce RDMA support
           - idpf: add initial PTP support
       - Meta (fbnic):
           - extend hardware stats coverage
           - add devlink dev flash support
       - Broadcom (bnxt):
           - add support for RX-side device memory TCP
       - Wangxun (txgbe):
           - implement support for udp tunnel offload
           - complete PTP and SRIOV support for AML 25G/10G devices

   - Ethernet NICs embedded and virtual:
       - Google (gve):
           - add device memory TCP TX support
       - Amazon (ena):
           - support persistent per-NAPI config
       - Airoha:
           - add H/W support for L2 traffic offload
           - add per flow stats for flow offloading
       - RealTek (rtl8211): add support for WoL magic packet
       - Synopsys (stmmac):
           - dwmac-socfpga 1000BaseX support
           - add Loongson-2K3000 support
           - introduce support for hardware-accelerated VLAN stripping
       - Broadcom (bcmgenet):
           - expose more H/W stats
       - Freescale (enetc, dpaa2-eth):
           - enetc: add MAC filter, VLAN filter RSS and loopback support
           - dpaa2-eth: convert to H/W timestamping APIs
       - vxlan: convert FDB table to rhashtable, for better scalabilty
       - veth: apply qdisc backpressure on full ring to reduce TX drops

   - Ethernet switches:
       - Microchip (kzZ88x3): add ETS scheduler support

   - Ethernet PHYs:
       - RealTek (rtl8211):
           - add support for WoL magic packet
           - add support for PHY LEDs

   - CAN:
       - Adds RZ/G3E CANFD support to the rcar_canfd driver.
       - Preparatory work for CAN-XL support.
       - Add self-tests framework with support for CAN physical interfaces.

   - WiFi:
       - mac80211:
           - scan improvements with multi-link operation (MLO)
       - Qualcomm (ath12k):
           - enable AHB support for IPQ5332
           - add monitor interface support to QCN9274
           - add multi-link operation support to WCN7850
           - add 802.11d scan offload support to WCN7850
           - monitor mode for WCN7850, better 6 GHz regulatory
       - Qualcomm (ath11k):
           - restore hibernation support
       - MediaTek (mt76):
           - WiFi-7 improvements
           - implement support for mt7990
       - Intel (iwlwifi):
           - enhanced multi-link single-radio (EMLSR) support on 5 GHz links
           - rework device configuration
       - RealTek (rtw88):
           - improve throughput for RTL8814AU
       - RealTek (rtw89):
           - add multi-link operation support
           - STA/P2P concurrency improvements
           - support different SAR configs by antenna

   - Bluetooth:
       - introduce HCI Driver protocol
       - btintel_pcie: do not generate coredump for diagnostic events
       - btusb: add HCI Drv commands for configuring altsetting
       - btusb: add RTL8851BE device 0x0bda:0xb850
       - btusb: add new VID/PID 13d3/3584 for MT7922
       - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925
       - btnxpuart: implement host-wakeup feature"

* tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits)
  selftests/bpf: Fix bpf selftest build warning
  selftests: netfilter: Fix skip of wildcard interface test
  net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames
  net: openvswitch: Fix the dead loop of MPLS parse
  calipso: Don't call calipso functions for AF_INET sk.
  selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem
  net_sched: hfsc: Address reentrant enqueue adding class to eltree twice
  octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback
  octeontx2-pf: QOS: Perform cache sync on send queue teardown
  net: mana: Add support for Multi Vports on Bare metal
  net: devmem: ncdevmem: remove unused variable
  net: devmem: ksft: upgrade rx test to send 1K data
  net: devmem: ksft: add 5 tuple FS support
  net: devmem: ksft: add exit_wait to make rx test pass
  net: devmem: ksft: add ipv4 support
  net: devmem: preserve sockc_err
  page_pool: fix ugly page_pool formatting
  net: devmem: move list_add to net_devmem_bind_dmabuf.
  selftests: netfilter: nft_queue.sh: include file transfer duration in log message
  net: phy: mscc: Fix memory leak when using one step timestamping
  ...
2025-05-28 15:24:36 -07:00
Linus Torvalds 3d413f0cfd audit-pr-20250527
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmg17ccUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPkCxAA4jAZfBVGFbaB9sPKlAOnkeIwI2iX
 HNXAwmG1GHjxmCqQSY84bPYmllkNkENHbxFmtLOdWtlWL+1JTPKjylasPvzDIUsW
 rqKyGUwJS5cZyIG0iyUGQEoTNlT4MbwvBBWZrLStCL+VRCyVOzyDo/kvs+AcxOYP
 vKuZZG9ke2roAnrgKrbZFy0kvzTFzcUJPMOwZIE/WneNogHY23+LzcTy06DVtVLt
 DRwzOSgPX99hJjtDTN5G8o7wY1FjlNlad4z3tuR+kEQsIImSgB1mZTnhCx5xNwfr
 LhujN2Po8EPKBA7U0AEtwmO2yl2OL69QlEuveEMl7SxFxSnMrnTlIpGr7EOmxcrS
 PFUBkOEAYfWTPtjxtbs1mYgJRcDlsLV2M0xJg58aESImMxcZPZ9oheHcAZX/A/eR
 V5bNR6I9CFkbSkrH2JG810AMB3NmNEw6ztH/vhqW1x8xYP8M/AxQmqYw00xkjQby
 3Qaek1+fIy1chdAJW19BKux38YRxYY48UosA73/G94Dm3N4C99zHaqOMmguaZJJ7
 bAxPNe+cBcMdf/XAw5TngihOXEs2n2qpRkN3K+RzGsqBcRBA/pTBM128mANxN7Ra
 MHj16OUc9m91TjRxPWgoL+g8DtQf9pM9T9DNL72cki2DeN1JBN8XuaKu+n04keKt
 heaVbCBYncs/pJw=
 =T6/o
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20250527' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:

 - Always record AUDIT_ANOM events when auditing is enabled.

   Prior to this patch we only recorded AUDIT_ANOM events if auditing
   was enabled and the admin/distro had explicitly configured audit
   beyond the defaults. Considering that AUDIT_ANOM events are anomolous
   events considered to be "security relevant", it seems wise to record
   these events as long as auditing is enabled, even if the system is
   running with a default audit configuration.

 - Mark the audit_log_vformat() function with the __printf() attribute
   to quiet GCC.

* tag 'audit-pr-20250527' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: record AUDIT_ANOM_* events regardless of presence of rules
  audit: mark audit_log_vformat() with __printf() attribute
2025-05-28 08:34:19 -07:00
Linus Torvalds 7af6e3febb integrity-v6.16
-----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCaDYRjRQcem9oYXJAbGlu
 dXguaWJtLmNvbQAKCRDLwZzRsCrn5fpRAQDsdIIwCgyQLFQhZq3wW5dhTUBQW8o9
 GjaNHpROKV57cwD9GqT78xi9qsxgaYW0lUUh5+zvlGI5cAtnl8/Fkby7hgY=
 =ohlH
 -----END PGP SIGNATURE-----

Merge tag 'integrity-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity updates from Mimi Zohar:
 "Carrying the IMA measurement list across kexec is not a new feature,
  but is updated to address a couple of issues:

   - Carrying the IMA measurement list across kexec required knowing
     apriori all the file measurements between the "kexec load" and
     "kexec execute" in order to measure them before the "kexec load".
     Any delay between the "kexec load" and "kexec exec" exacerbated the
     problem.

   - Any file measurements post "kexec load" were not carried across
     kexec, resulting in the measurement list being out of sync with the
     TPM PCR.

  With these changes, the buffer for the IMA measurement list is still
  allocated at "kexec load", but copying the IMA measurement list is
  deferred to after quiescing the TPM.

  Two new kexec critical data records are defined"

* tag 'integrity-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: do not copy measurement list to kdump kernel
  ima: measure kexec load and exec events as critical data
  ima: make the kexec extra memory configurable
  ima: verify if the segment size has changed
  ima: kexec: move IMA log copy from kexec load to execute
  ima: kexec: define functions to copy IMA log at soft boot
  ima: kexec: skip IMA segment validation after kexec soft reboot
  kexec: define functions to map and unmap segments
  ima: define and call ima_alloc_kexec_file_buf()
  ima: rename variable the seq_file "file" to "ima_kexec_file"
2025-05-28 08:12:33 -07:00
Linus Torvalds feacb1774b sched_ext: Changes for v6.16
- More in-kernel idle CPU selection improvements. Expand topology awareness
   coverage add scx_bpf_select_cpu_and() to allow more flexibility. The idle
   CPU selection kfuncs can now be called from unlocked contexts too.
 
 - A bunch of reorganization changes to lay the foundation for multiple
   hierarchical scheduler support. This isn't ready yet and the included
   changes don't make meaningful behavior differences. One notable change is
   replacing some static_key tests with dynamic tests as the test results may
   differ depending on the scheduler instance. This isn't expected to cause
   meaningful performance difference.
 
 - Other minor and doc updates.
 
 - There were multiple patches in for-6.15-fixes which conflicted with
   changes in for-6.16. for-6.15-fixes were pulled three times into for-6.16
   to resolve the conflicts.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaDYZMw4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGfbcAQDRloVb/d5RfC6VYlue9EV1jHuoJefTYHvR3jmO
 ju70EQEAjLBXw58XAePQ9La/570JELgsC5FzJp3tLTilGx2JyQA=
 =7cDG
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext updates from Tejun Heo:

 - More in-kernel idle CPU selection improvements. Expand topology
   awareness coverage add scx_bpf_select_cpu_and() to allow more
   flexibility. The idle CPU selection kfuncs can now be called from
   unlocked contexts too.

 - A bunch of reorganization changes to lay the foundation for multiple
   hierarchical scheduler support. This isn't ready yet and the included
   changes don't make meaningful behavior differences. One notable
   change is replacing some static_key tests with dynamic tests as the
   test results may differ depending on the scheduler instance. This
   isn't expected to cause meaningful performance difference.

 - Other minor and doc updates.

 - There were multiple patches in for-6.15-fixes which conflicted with
   changes in for-6.16. for-6.15-fixes were pulled three times into
   for-6.16 to resolve the conflicts.

* tag 'sched_ext-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (49 commits)
  sched_ext: Call ops.update_idle() after updating builtin idle bits
  sched_ext, docs: convert mentions of "CFS" to "fair-class scheduler"
  selftests/sched_ext: Update test enq_select_cpu_fails
  sched_ext: idle: Consolidate default idle CPU selection kfuncs
  selftests/sched_ext: Add test for scx_bpf_select_cpu_and() via test_run
  sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context
  sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and()
  sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c
  sched_ext, docs: add label
  sched_ext: Explain the temporary situation around scx_root dereferences
  sched_ext: Add @sch to SCX_CALL_OP*()
  sched_ext: Cleanup [__]scx_exit/error*()
  sched_ext: Add @sch to SCX_CALL_OP*()
  sched_ext: Clean up scx_root usages
  Documentation: scheduler: Changed lowercase acronyms to uppercase
  sched_ext: Avoid NULL scx_root deref in __scx_exit()
  sched_ext: Add RCU protection to scx_root in DSQ iterator
  sched_ext: Clean up SCX_EXIT_NONE handling in scx_disable_workfn()
  sched_ext: Move disable machinery into scx_sched
  sched_ext: Move event_stats_cpu into scx_sched
  ...
2025-05-27 21:12:50 -07:00
Linus Torvalds 3b66e6b3c0 cgroup: Changes for v6.16
- cgroup rstat shared the tracking tree across all controlers with the
   rationale being that a cgroup which is using one resource is likely to be
   using other resources at the same time (ie. if something is allocating
   memory, it's probably consuming CPU cycles). However, this turned out to
   not scale very well especially with memcg using rstat for internal
   operations which made memcg stat read and flush patterns substantially
   different from other controllers. JP Kobryn split the rstat tree per
   controller.
 
 - cgroup BPF support was hooking into cgroup init/exit paths directly.
   Convert them to use a notifier chain instead so that other usages can be
   added easily. The two of the patches which implement this are mislabeled
   as belonging to sched_ext instead of cgroup. Sorry.
 
 - Relatively minor cpuset updates.
 
 - Documentation updates.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaDYUmA4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGRhbAP90v8QwUkWEKGQSam8JY3by7PvrW6pV5ot+BGuM
 4xu3BAEAjsJ9FdiwYLwKYqG7y59xhhBFOo6GpcP52kPp3znl+QQ=
 =6MIT
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - cgroup rstat shared the tracking tree across all controllers with the
   rationale being that a cgroup which is using one resource is likely
   to be using other resources at the same time (ie. if something is
   allocating memory, it's probably consuming CPU cycles).

   However, this turned out to not scale very well especially with memcg
   using rstat for internal operations which made memcg stat read and
   flush patterns substantially different from other controllers. JP
   Kobryn split the rstat tree per controller.

 - cgroup BPF support was hooking into cgroup init/exit paths directly.

   Convert them to use a notifier chain instead so that other usages can
   be added easily. The two of the patches which implement this are
   mislabeled as belonging to sched_ext instead of cgroup. Sorry.

 - Relatively minor cpuset updates

 - Documentation updates

* tag 'cgroup-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (23 commits)
  sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier
  sched_ext: Introduce cgroup_lifetime_notifier
  cgroup: Minor reorganization of cgroup_create()
  cgroup, docs: cpu controller's interaction with various scheduling policies
  cgroup, docs: convert space indentation to tab indentation
  cgroup: avoid per-cpu allocation of size zero rstat cpu locks
  cgroup, docs: be specific about bandwidth control of rt processes
  cgroup: document the rstat per-cpu initialization
  cgroup: helper for checking rstat participation of css
  cgroup: use subsystem-specific rstat locks to avoid contention
  cgroup: use separate rstat trees for each subsystem
  cgroup: compare css to cgroup::self in helper for distingushing css
  cgroup: warn on rstat usage by early init subsystems
  cgroup/cpuset: drop useless cpumask_empty() in compute_effective_exclusive_cpumask()
  cgroup/rstat: Improve cgroup_rstat_push_children() documentation
  cgroup: fix goto ordering in cgroup_init()
  cgroup: fix pointer check in css_rstat_init()
  cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs
  cgroup/cpuset: Fix obsolete comment in cpuset_css_offline()
  cgroup/cpuset: Always use cpu_active_mask
  ...
2025-05-27 20:59:53 -07:00
Linus Torvalds 91ad250cbe workqueue: Changes for v6.16
Fix statistic update race condition and a couple documentation updates.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaDYPDw4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGdczAQDbp6wxE64DNfdVvEss07dHisEYobJVb5ojHCl6
 FkTYsQEAz/WtB5mwh5ohhemDKhOuG6PZuHqjuMGFCndpfxyQQQg=
 =WNPk
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue updates from Tejun Heo:
 "Fix statistic update race condition and a couple documentation
  updates"

* tag 'wq-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix typo in comment
  workqueue: Fix race condition in wq->stats incrementation
  workqueue: Better document teardown for delayed_work
2025-05-27 20:49:06 -07:00
Linus Torvalds f1975e4765 Summary
* Move kern_table members out of kernel/sysctl.c
 
   Moved a subset (tracing, panic, signal, stack_tracer and sparc) out of the
   kern_table array. The goal is for kern_table to only have sysctl elements. All
   this increases modularity by placing the ctl_tables closer to where they are
   used while reducing the chances of merge conflicts in kernel/sysctl.c.
 
 * Fixed sysctl unit test panic by relocating it to selftests
 
 * Testing
 
   These have been in linux-next from rc2, so they have had more than a month
   worth of testing.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmgwLsAACgkQupfNUreW
 QU9ghwv/VKZW+IXEvSjc8OiwntWkL7e5ddHY6O2Vf44MzhBefLTXmfx2HfkEA0Xw
 RaOQ28Hf/zQL83RqHHnXqI7JdGWQJUm8bCPwk4H3DCaF8qOfPVvblVYmfNL2auSY
 oyRRpRzZuY5EtKcrNjiHFHL2WIC8KvPVwS748oHY1eZY7kn1fcs8DDnNO4iuWop+
 uJeDxu87wkRCFXF3DIM+MAHRvxSa8GHtZvb9EjAl/EHMbAyVSz3uTb7FdQDdnE09
 s7P30EC03RHtgi3sd2Ku04dJsHLz7VErvpToxSH2KFlcdpJuWuCSCTT8XaD8kII8
 kYYCxNpmPOf4LzEy/J2vVZB0PSHrHvuQCH7iGy+8wOPk9GHTOMkKMMXVmeGnAsef
 AiosPYroxXp/nBFcuNs6/1LKpsdpFr2F6u6oMgbzLaW1Xe/oc+6oynuOgeVj9LuM
 FrSxSwaVvpdwHYHujYPQAAWIgKRzITiEXnCgtSyohFquKb+7E8ZspwjOqYH2xWMQ
 WwABNRqY
 =45X2
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

Pull sysctl updates from Joel Granados:

 - Move kern_table members out of kernel/sysctl.c

   Moved a subset (tracing, panic, signal, stack_tracer and sparc) out
   of the kern_table array. The goal is for kern_table to only have
   sysctl elements. All this increases modularity by placing the
   ctl_tables closer to where they are used while reducing the chances
   of merge conflicts in kernel/sysctl.c.

 - Fixed sysctl unit test panic by relocating it to selftests

* tag 'sysctl-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  sysctl: Close test ctl_headers with a for loop
  sysctl: call sysctl tests with a for loop
  sysctl: Add 0012 to test the u8 range check
  sysctl: move u8 register test to lib/test_sysctl.c
  sparc: mv sparc sysctls into their own file under arch/sparc/kernel
  stack_tracer: move sysctl registration to kernel/trace/trace_stack.c
  tracing: Move trace sysctls into trace.c
  signal: Move signal ctl tables into signal.c
  panic: Move panic ctl tables into panic.c
2025-05-27 20:43:35 -07:00
Linus Torvalds 5cf5240991 xen: branch for v6.16-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCaDQJqgAKCRCAXGG7T9hj
 viNAAP0SmAKx3R04Q90hx4d9TU1UBrT0iu2tQI7PzNmm6dR6QQD/enuEALQUk5tP
 LwDzVLgOBvqkzewQ3b6LYA2R+snmjwg=
 =M+nH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - A fix for running as a Xen dom0 on the iMX8QXP Arm platform

 - An update of the xen.config adding XEN_UNPOPULATED_ALLOC for better
   support of PVH dom0

 - A fix of the Xen balloon driver when running without
   CONFIG_XEN_UNPOPULATED_ALLOC

 - A fix of the dm_op Xen hypercall on Arm needed to pass user space
   buffers to the hypervisor in certain configurations

* tag 'for-linus-6.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/arm: call uaccess_ttbr0_enable for dm_op hypercall
  xen/x86: fix initial memory balloon target
  xen: enable XEN_UNPOPULATED_ALLOC as part of xen.config
  xen: swiotlb: Wire up map_resource callback
2025-05-27 20:36:30 -07:00
Linus Torvalds 23022f5456 dma-mapping updates for Linux 6.16:
- new two step DMA mapping API, which is is a first step to a long path
   to provide alternatives to scatterlist and to remove hacks, abuses and
   design mistakes related to scatterlists; this new approach optimizes
   some calls to DMA-IOMMU layer and cache maintenance by batching them,
   reduces memory usage as it is no need to store mapped DMA addresses to
   unmap them, and reduces some function call overhead; it is a combination
   effort of many people, lead and developed by Christoph Hellwig and Leon
   Romanovsky
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaDRXIQAKCRCJp1EFxbsS
 RG8tAP9kgjIwMoJqfr6DC8yYraIIUuNDyhb/fZ9vPppW6Cb7aAD/cg8udjrsUu3h
 iAZBIHkYuWmkx8JG7t5/lqBc4AOC1AA=
 =F3TU
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.16-2025-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping updates from Marek Szyprowski:
 "New two step DMA mapping API, which is is a first step to a long path
  to provide alternatives to scatterlist and to remove hacks, abuses and
  design mistakes related to scatterlists.

  This new approach optimizes some calls to DMA-IOMMU layer and cache
  maintenance by batching them, reduces memory usage as it is no need to
  store mapped DMA addresses to unmap them, and reduces some function
  call overhead.  It is a combination effort of many people, lead and
  developed by Christoph Hellwig and Leon Romanovsky"

* tag 'dma-mapping-6.16-2025-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  docs: core-api: document the IOVA-based API
  dma-mapping: add a dma_need_unmap helper
  dma-mapping: Implement link/unlink ranges API
  iommu/dma: Factor out a iommu_dma_map_swiotlb helper
  dma-mapping: Provide an interface to allow allocate IOVA
  iommu: add kernel-doc for iommu_unmap_fast
  iommu: generalize the batched sync after map interface
  dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h
  PCI/P2PDMA: Refactor the p2pdma mapping helpers
2025-05-27 20:09:06 -07:00
Linus Torvalds c89756bcf4 Power management updates for 6.16-rc1
- Fix potential division-by-zero error in em_compute_costs() (Yaxiong
    Tian).
 
  - Fix typos in energy model documentation and example driver code (Moon
    Hee Lee, Atul Kumar Pant).
 
  - Rearrange the energy model management code and add a new function for
    adjusting a CPU energy model after adjusting the capacity of the
    given CPU to it (Rafael Wysocki).
 
  - Refactor cpufreq_online(), add and use cpufreq policy locking guards,
    use __free() in policy reference counting, and clean up core cpufreq
    code on top of that (Rafael Wysocki).
 
  - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh
    Kumar).
 
  - Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay
    Ugwekar).
 
  - Add offline, online and suspend callbacks to the amd-pstate driver,
    rename and use the existing amd_pstate_epp callbacks in it (Dhananjay
    Ugwekar).
 
  - Add support for the "Requested CPU Min frequency" BIOS option to the
    amd-pstate driver (Dhananjay Ugwekar).
 
  - Reset amd-pstate driver mode after running selftests (Swapnil
    Sapkal).
 
  - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan
    Chancellor).
 
  - Add helper for governor checks to the schedutil cpufreq governor and
    move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki).
 
  - Populate the cpu_capacity sysfs entries from the intel_pstate driver
    after registering asym capacity support (Ricardo Neri).
 
  - Add support for enabling Energy-aware scheduling (EAS) to the
    intel_pstate driver when operating in the passive mode on a hybrid
    platform (Rafael Wysocki).
 
  - Drop redundant cpus_read_lock() from store_local_boost() in the
    cpufreq core (Seyediman Seyedarab).
 
  - Replace sscanf() with kstrtouint() in the cpufreq code and use a
    symbol instead of a raw number in it (Bowen Yu).
 
  - Add support for autonomous CPU performance state selection to the
    CPPC cpufreq driver (Lifeng Zheng).
 
  - OPP: Add dev_pm_opp_set_level() (Praveen Talari).
 
  - Introduce scope-based cleanup headers and mutex locking guards in OPP
    core (Viresh Kumar).
 
  - Switch OPP to use kmemdup_array() (Zhang Enpei).
 
  - Optimize bucket assignment when next_timer_ns equals KTIME_MAX in the
    menu cpuidle governor (Zhongqiu Han).
 
  - Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla).
 
  - Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem
    Bityutskiy).
 
  - Fix typos in two comments in the teo cpuidle governor (Atul Kumar
    Pant).
 
  - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
    Kalla).
 
  - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki).
 
  - Add new devm_ functions for enabling runtime PM and runtime PM
    reference counting (Bence Csókás).
 
  - Remove size arguments from strscpy() calls in the hibernation core
    code (Thorsten Blum).
 
  - Adjust the handling of devices with asynchronous suspend enabled
    during system suspend and resume to start resuming them immediately
    after resuming their parents and to start suspending such a device
    immediately after suspending its first child (Rafael Wysocki).
 
  - Adjust messages printed during tasks freezing to avoid using
    pr_cont() (Andrew Sayers, Paul Menzel).
 
  - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
    Zhang).
 
  - Add missing wakeup source attribute relax_count to sysfs and
    remove the space character at the end ofi the string produced by
    pm_show_wakelocks() (Zijun Hu).
 
  - Add configurable pm_test delay for hibernation (Zihuan Zhang).
 
  - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
    cypd4226 device on Tegra boards from suspending prematurely (Jon
    Hunter).
 
  - Unbreak printing PM debug messages during hibernation and clean up
    some related code (Rafael Wysocki).
 
  - Add a systemd service to run cpupower and change cpupower binding's
    Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli).
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmg0xS0SHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1AwwH/Rvgza5YBPb9JZqWJT/ZiBw7HcEWHhP1
 fNfcVU1gXPZiF0yoPfjfJua6BcLj6lyQ3d/+zWqqAcWfmRSD6HPe8yYz8qALUAqj
 RWhDa04aGj6B9bQuOjejatznYlQlkwCRT7zec+75D+dAHVMqR/Vt2LFAetCadgHe
 MQibAQmVFXu3RFkBjReTAdGzVoTXkwoZDrzdfA2aFAfMJNtJpOW4atUZvnucuctv
 VK3ZratrctCIw7yXEoB1nWSmlY7R5JlslplBfndjmmOnky3YxNr7C6paqwtbTWoF
 MiX48qkmLOGeO6gS8s/lVCDQ4oZ+UNFQvXRsM5NGjycBikhHX/dp/w4=
 =dIqJ
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "Once again, the changes are dominated by cpufreq updates, but this
  time the majority of them are cpufreq core changes, mostly related to
  the introduction of policy locking guards and __free() usage, and
  fixes related to boost handling.

  Still, there is also a significant update of the intel_pstate driver
  making it register an energy model when running on a hybrid platform
  which is used for enabling energy-aware scheduling (EAS) if the driver
  operates in the passive mode (and schedutil is used as the cpufreq
  governor for all CPUs which is the passive mode default).

  There are some amd-pstate driver updates too, for a good measure,
  including the "Requested CPU Min frequency" BIOS option support and
  new online/offline callbacks.

  In the cpuidle space, the most significant change is the addition of a
  C1 demotion on/off sysfs knob to intel_idle which should help some
  users to configure their systems more precisely. There is also the
  conversion of the PSCI cpuidle driver to a faux device one and there
  are two small updates of cpuidle governors.

  Device power management is also modified quite a bit, especially the
  handling of devices with asynchronous suspend and resume enabled
  during system transitions. They are now going to be handled more
  asynchronously during suspend transitions and somewhat less
  aggressively during resume transitions.

  Apart from the above, the operating performance points (OPP) library
  is now going to use mutex locking guards and scope-based cleanup
  helpers and there is the usual bunch of assorted fixes and code
  cleanups.

  Specifics:

   - Fix potential division-by-zero error in em_compute_costs() (Yaxiong
     Tian)

   - Fix typos in energy model documentation and example driver code
     (Moon Hee Lee, Atul Kumar Pant)

   - Rearrange the energy model management code and add a new function
     for adjusting a CPU energy model after adjusting the capacity of
     the given CPU to it (Rafael Wysocki)

   - Refactor cpufreq_online(), add and use cpufreq policy locking
     guards, use __free() in policy reference counting, and clean up
     core cpufreq code on top of that (Rafael Wysocki)

   - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh
     Kumar)

   - Fix des_perf clamping with max_perf in amd_pstate_update()
     (Dhananjay Ugwekar)

   - Add offline, online and suspend callbacks to the amd-pstate driver,
     rename and use the existing amd_pstate_epp callbacks in it
     (Dhananjay Ugwekar)

   - Add support for the "Requested CPU Min frequency" BIOS option to
     the amd-pstate driver (Dhananjay Ugwekar)

   - Reset amd-pstate driver mode after running selftests (Swapnil
     Sapkal)

   - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan
     Chancellor)

   - Add helper for governor checks to the schedutil cpufreq governor
     and move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki)

   - Populate the cpu_capacity sysfs entries from the intel_pstate
     driver after registering asym capacity support (Ricardo Neri)

   - Add support for enabling Energy-aware scheduling (EAS) to the
     intel_pstate driver when operating in the passive mode on a hybrid
     platform (Rafael Wysocki)

   - Drop redundant cpus_read_lock() from store_local_boost() in the
     cpufreq core (Seyediman Seyedarab)

   - Replace sscanf() with kstrtouint() in the cpufreq code and use a
     symbol instead of a raw number in it (Bowen Yu)

   - Add support for autonomous CPU performance state selection to the
     CPPC cpufreq driver (Lifeng Zheng)

   - OPP: Add dev_pm_opp_set_level() (Praveen Talari)

   - Introduce scope-based cleanup headers and mutex locking guards in
     OPP core (Viresh Kumar)

   - Switch OPP to use kmemdup_array() (Zhang Enpei)

   - Optimize bucket assignment when next_timer_ns equals KTIME_MAX in
     the menu cpuidle governor (Zhongqiu Han)

   - Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla)

   - Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem
     Bityutskiy)

   - Fix typos in two comments in the teo cpuidle governor (Atul Kumar
     Pant)

   - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
     Kalla)

   - Move debug runtime PM attributes to runtime_attrs[] (Rafael
     Wysocki)

   - Add new devm_ functions for enabling runtime PM and runtime PM
     reference counting (Bence Csókás)

   - Remove size arguments from strscpy() calls in the hibernation core
     code (Thorsten Blum)

   - Adjust the handling of devices with asynchronous suspend enabled
     during system suspend and resume to start resuming them immediately
     after resuming their parents and to start suspending such a device
     immediately after suspending its first child (Rafael Wysocki)

   - Adjust messages printed during tasks freezing to avoid using
     pr_cont() (Andrew Sayers, Paul Menzel)

   - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
     Zhang)

   - Add missing wakeup source attribute relax_count to sysfs and remove
     the space character at the end ofi the string produced by
     pm_show_wakelocks() (Zijun Hu)

   - Add configurable pm_test delay for hibernation (Zihuan Zhang)

   - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
     cypd4226 device on Tegra boards from suspending prematurely (Jon
     Hunter)

   - Unbreak printing PM debug messages during hibernation and clean up
     some related code (Rafael Wysocki)

   - Add a systemd service to run cpupower and change cpupower binding's
     Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli)"

* tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
  cpufreq: CPPC: Add support for autonomous selection
  cpufreq: Update sscanf() to kstrtouint()
  cpufreq: Replace magic number
  OPP: switch to use kmemdup_array()
  PM: freezer: Rewrite restarting tasks log to remove stray *done.*
  PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn()
  cpufreq: drop redundant cpus_read_lock() from store_local_boost()
  cpupower: do not install files to /etc/default/
  cpupower: do not call systemctl at install time
  cpupower: do not write DESTDIR to cpupower.service
  PM: sleep: Introduce pm_sleep_transition_in_progress()
  cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver()
  cpufreq: intel_pstate: Document hybrid processor support
  cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache
  cpufreq: intel_pstate: EAS support for hybrid platforms
  PM: EM: Introduce em_adjust_cpu_capacity()
  PM: EM: Move CPU capacity check to em_adjust_new_capacity()
  PM: EM: Documentation: Fix typos in example driver code
  cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas()
  PM: sleep: Introduce pm_suspend_in_progress()
  ...
2025-05-27 16:48:47 -07:00
Yonghong Song 5ffb537e41 selftests/bpf: Add tests with stack ptr register in conditional jmp
Add two tests:
  - one test has 'rX <op> r10' where rX is not r10, and
  - another test has 'rX <op> rY' where rX and rY are not r10
    but there is an early insn 'rX = r10'.

Without previous verifier change, both tests will fail.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250524041340.4046304-1-yonghong.song@linux.dev
2025-05-27 14:09:12 -07:00
Yonghong Song e2d2115e56 bpf: Do not include stack ptr register in precision backtracking bookkeeping
Yi Lai reported an issue ([1]) where the following warning appears
in kernel dmesg:
  [   60.643604] verifier backtracking bug
  [   60.643635] WARNING: CPU: 10 PID: 2315 at kernel/bpf/verifier.c:4302 __mark_chain_precision+0x3a6c/0x3e10
  [   60.648428] Modules linked in: bpf_testmod(OE)
  [   60.650471] CPU: 10 UID: 0 PID: 2315 Comm: test_progs Tainted: G           OE       6.15.0-rc4-gef11287f8289-dirty #327 PREEMPT(full)
  [   60.654385] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  [   60.656682] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [   60.660475] RIP: 0010:__mark_chain_precision+0x3a6c/0x3e10
  [   60.662814] Code: 5a 30 84 89 ea e8 c4 d9 01 00 80 3d 3e 7d d8 04 00 0f 85 60 fa ff ff c6 05 31 7d d8 04
                       01 48 c7 c7 00 58 30 84 e8 c4 06 a5 ff <0f> 0b e9 46 fa ff ff 48 ...
  [   60.668720] RSP: 0018:ffff888116cc7298 EFLAGS: 00010246
  [   60.671075] RAX: 54d70e82dfd31900 RBX: ffff888115b65e20 RCX: 0000000000000000
  [   60.673659] RDX: 0000000000000001 RSI: 0000000000000004 RDI: 00000000ffffffff
  [   60.676241] RBP: 0000000000000400 R08: ffff8881f6f23bd3 R09: 1ffff1103ede477a
  [   60.678787] R10: dffffc0000000000 R11: ffffed103ede477b R12: ffff888115b60ae8
  [   60.681420] R13: 1ffff11022b6cbc4 R14: 00000000fffffff2 R15: 0000000000000001
  [   60.684030] FS:  00007fc2aedd80c0(0000) GS:ffff88826fa8a000(0000) knlGS:0000000000000000
  [   60.686837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   60.689027] CR2: 000056325369e000 CR3: 000000011088b002 CR4: 0000000000370ef0
  [   60.691623] Call Trace:
  [   60.692821]  <TASK>
  [   60.693960]  ? __pfx_verbose+0x10/0x10
  [   60.695656]  ? __pfx_disasm_kfunc_name+0x10/0x10
  [   60.697495]  check_cond_jmp_op+0x16f7/0x39b0
  [   60.699237]  do_check+0x58fa/0xab10
  ...

Further analysis shows the warning is at line 4302 as below:

  4294                 /* static subprog call instruction, which
  4295                  * means that we are exiting current subprog,
  4296                  * so only r1-r5 could be still requested as
  4297                  * precise, r0 and r6-r10 or any stack slot in
  4298                  * the current frame should be zero by now
  4299                  */
  4300                 if (bt_reg_mask(bt) & ~BPF_REGMASK_ARGS) {
  4301                         verbose(env, "BUG regs %x\n", bt_reg_mask(bt));
  4302                         WARN_ONCE(1, "verifier backtracking bug");
  4303                         return -EFAULT;
  4304                 }

With the below test (also in the next patch):
  __used __naked static void __bpf_jmp_r10(void)
  {
	asm volatile (
	"r2 = 2314885393468386424 ll;"
	"goto +0;"
	"if r2 <= r10 goto +3;"
	"if r1 >= -1835016 goto +0;"
	"if r2 <= 8 goto +0;"
	"if r3 <= 0 goto +0;"
	"exit;"
	::: __clobber_all);
  }

  SEC("?raw_tp")
  __naked void bpf_jmp_r10(void)
  {
	asm volatile (
	"r3 = 0 ll;"
	"call __bpf_jmp_r10;"
	"r0 = 0;"
	"exit;"
	::: __clobber_all);
  }

The following is the verifier failure log:
  0: (18) r3 = 0x0                      ; R3_w=0
  2: (85) call pc+2
  caller:
   R10=fp0
  callee:
   frame1: R1=ctx() R3_w=0 R10=fp0
  5: frame1: R1=ctx() R3_w=0 R10=fp0
  ; asm volatile ("                                 \ @ verifier_precision.c:184
  5: (18) r2 = 0x20202000256c6c78       ; frame1: R2_w=0x20202000256c6c78
  7: (05) goto pc+0
  8: (bd) if r2 <= r10 goto pc+3        ; frame1: R2_w=0x20202000256c6c78 R10=fp0
  9: (35) if r1 >= 0xffe3fff8 goto pc+0         ; frame1: R1=ctx()
  10: (b5) if r2 <= 0x8 goto pc+0
  mark_precise: frame1: last_idx 10 first_idx 0 subseq_idx -1
  mark_precise: frame1: regs=r2 stack= before 9: (35) if r1 >= 0xffe3fff8 goto pc+0
  mark_precise: frame1: regs=r2 stack= before 8: (bd) if r2 <= r10 goto pc+3
  mark_precise: frame1: regs=r2,r10 stack= before 7: (05) goto pc+0
  mark_precise: frame1: regs=r2,r10 stack= before 5: (18) r2 = 0x20202000256c6c78
  mark_precise: frame1: regs=r10 stack= before 2: (85) call pc+2
  BUG regs 400

The main failure reason is due to r10 in precision backtracking bookkeeping.
Actually r10 is always precise and there is no need to add it for the precision
backtracking bookkeeping.

One way to fix the issue is to prevent bt_set_reg() if any src/dst reg is
r10. Andrii suggested to go with push_insn_history() approach to avoid
explicitly checking r10 in backtrack_insn().

This patch added push_insn_history() support for cond_jmp like 'rX <op> rY'
operations. In check_cond_jmp_op(), if any of rX or rY is a stack pointer,
push_insn_history() will record such information, and later backtrack_insn()
will do bt_set_reg() properly for those register(s).

  [1] https://lore.kernel.org/bpf/Z%2F8q3xzpU59CIYQE@ly-workstation/

Reported by: Yi Lai <yi1.lai@linux.intel.com>

Fixes: 407958a0e9 ("bpf: encapsulate precision backtracking bookkeeping")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250524041335.4046126-1-yonghong.song@linux.dev
2025-05-27 14:09:12 -07:00
Hou Tao d496557826 bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem()
bpf_map_lookup_percpu_elem() helper is also available for sleepable bpf
program. When BPF JIT is disabled or under 32-bit host,
bpf_map_lookup_percpu_elem() will not be inlined. Using it in a
sleepable bpf program will trigger the warning in
bpf_map_lookup_percpu_elem(), because the bpf program only holds
rcu_read_lock_trace lock. Therefore, add the missed check.

Reported-by: syzbot+dce5aae19ae4d6399986@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/bpf/000000000000176a130617420310@google.com/
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250526062534.1105938-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27 10:45:59 -07:00
KaFai Wan 86bc9c7424 bpf: Avoid __bpf_prog_ret0_warn when jit fails
syzkaller reported an issue:

WARNING: CPU: 3 PID: 217 at kernel/bpf/core.c:2357 __bpf_prog_ret0_warn+0xa/0x20 kernel/bpf/core.c:2357
Modules linked in:
CPU: 3 UID: 0 PID: 217 Comm: kworker/u32:6 Not tainted 6.15.0-rc4-syzkaller-00040-g8bac8898fe39
RIP: 0010:__bpf_prog_ret0_warn+0xa/0x20 kernel/bpf/core.c:2357
Call Trace:
 <TASK>
 bpf_dispatcher_nop_func include/linux/bpf.h:1316 [inline]
 __bpf_prog_run include/linux/filter.h:718 [inline]
 bpf_prog_run include/linux/filter.h:725 [inline]
 cls_bpf_classify+0x74a/0x1110 net/sched/cls_bpf.c:105
 ...

When creating bpf program, 'fp->jit_requested' depends on bpf_jit_enable.
This issue is triggered because of CONFIG_BPF_JIT_ALWAYS_ON is not set
and bpf_jit_enable is set to 1, causing the arch to attempt JIT the prog,
but jit failed due to FAULT_INJECTION. As a result, incorrectly
treats the program as valid, when the program runs it calls
`__bpf_prog_ret0_warn` and triggers the WARN_ON_ONCE(1).

Reported-by: syzbot+0903f6d7f285e41cdf10@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/bpf/6816e34e.a70a0220.254cdc.002c.GAE@google.com
Fixes: fa9dd599b4 ("bpf: get rid of pure_initcall dependency to enable jits")
Signed-off-by: KaFai Wan <mannkafai@gmail.com>
Link: https://lore.kernel.org/r/20250526133358.2594176-1-mannkafai@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27 10:43:10 -07:00
Yonghong Song f95695f2c4 bpf: Warn with __bpf_trap() kfunc maybe due to uninitialized variable
Marc Suñé (Isovalent, part of Cisco) reported an issue where an
uninitialized variable caused generating bpf prog binary code not
working as expected. The reproducer is in [1] where the flags
“-Wall -Werror” are enabled, but there is no warning as the compiler
takes advantage of uninitialized variable to do aggressive optimization.
The optimized code looks like below:

      ; {
           0:       bf 16 00 00 00 00 00 00 r6 = r1
      ;       bpf_printk("Start");
           1:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
                    0000000000000008:  R_BPF_64_64  .rodata
           3:       b4 02 00 00 06 00 00 00 w2 = 0x6
           4:       85 00 00 00 06 00 00 00 call 0x6
      ; DEFINE_FUNC_CTX_POINTER(data)
           5:       61 61 4c 00 00 00 00 00 w1 = *(u32 *)(r6 + 0x4c)
      ;       bpf_printk("pre ipv6_hdrlen_offset");
           6:       18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll
                    0000000000000030:  R_BPF_64_64  .rodata
           8:       b4 02 00 00 17 00 00 00 w2 = 0x17
           9:       85 00 00 00 06 00 00 00 call 0x6
      <END>

The verifier will report the following failure:
  9: (85) call bpf_trace_printk#6
  last insn is not an exit or jmp

The above verifier log does not give a clear hint about how to fix
the problem and user may take quite some time to figure out that
the issue is due to compiler taking advantage of uninitialized variable.

In llvm internals, uninitialized variable usage may generate
'unreachable' IR insn and these 'unreachable' IR insns may indicate
uninitialized variable impact on code optimization. So far, llvm
BPF backend ignores 'unreachable' IR hence the above code is generated.
With clang21 patch [2], those 'unreachable' IR insn are converted
to func __bpf_trap(). In order to maintain proper control flow
graph for bpf progs, [2] also adds an 'exit' insn after bpf_trap()
if __bpf_trap() is the last insn in the function. The new code looks like:

      ; {
           0:       bf 16 00 00 00 00 00 00 r6 = r1
      ;       bpf_printk("Start");
           1:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
                    0000000000000008:  R_BPF_64_64  .rodata
           3:       b4 02 00 00 06 00 00 00 w2 = 0x6
           4:       85 00 00 00 06 00 00 00 call 0x6
      ; DEFINE_FUNC_CTX_POINTER(data)
           5:       61 61 4c 00 00 00 00 00 w1 = *(u32 *)(r6 + 0x4c)
      ;       bpf_printk("pre ipv6_hdrlen_offset");
           6:       18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll
                    0000000000000030:  R_BPF_64_64  .rodata
           8:       b4 02 00 00 17 00 00 00 w2 = 0x17
           9:       85 00 00 00 06 00 00 00 call 0x6
          10:       85 10 00 00 ff ff ff ff call -0x1
                    0000000000000050:  R_BPF_64_32  __bpf_trap
          11:       95 00 00 00 00 00 00 00 exit
      <END>

In kernel, a new kfunc __bpf_trap() is added. During insn
verification, any hit with __bpf_trap() will result in
verification failure. The kernel is able to provide better
log message for debugging.

With llvm patch [2] and without this patch (no __bpf_trap()
kfunc for existing kernel), e.g., for old kernels, the verifier
outputs
  10: <invalid kfunc call>
  kfunc '__bpf_trap' is referenced but wasn't resolved
Basically, kernel does not support __bpf_trap() kfunc.
This still didn't give clear signals about possible reason.

With llvm patch [2] and with this patch, the verifier outputs
  10: (85) call __bpf_trap#74479
  unexpected __bpf_trap() due to uninitialized variable?
It gives much better hints for verification failure.

  [1] https://github.com/msune/clang_bpf/blob/main/Makefile#L3
  [2] https://github.com/llvm/llvm-project/pull/131731

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250523205326.1291640-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27 09:54:20 -07:00
Yonghong Song d848bba680 bpf: Remove special_kfunc_set from verifier
Currently, the verifier has both special_kfunc_set and special_kfunc_list.
When adding a new kfunc usage to the verifier, it is often confusing
about whether special_kfunc_set or special_kfunc_list or both should
add that kfunc. For example, some kfuncs, e.g., bpf_dynptr_from_skb,
bpf_dynptr_clone, bpf_wq_set_callback_impl, does not need to be
in special_kfunc_set.

To avoid potential future confusion, special_kfunc_set is deleted
and btf_id_set_contains(&special_kfunc_set, ...) is removed.
The code is refactored with a new func check_special_kfunc(),
which contains all codes covered by original branch
  meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)

There is no functionality change.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250523205321.1291431-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27 09:52:56 -07:00
T.J. Mercier 6eab7ac7c5 bpf: Add open coded dmabuf iterator
This open coded iterator allows for more flexibility when creating BPF
programs. It can support output in formats other than text. With an open
coded iterator, a single BPF program can traverse multiple kernel data
structures (now including dmabufs), allowing for more efficient analysis
of kernel data compared to multiple reads from procfs, sysfs, or
multiple traditional BPF iterator invocations.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250522230429.941193-4-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27 09:51:25 -07:00
T.J. Mercier 76ea955349 bpf: Add dmabuf iterator
The dmabuf iterator traverses the list of all DMA buffers.

DMA buffers are refcounted through their associated struct file. A
reference is taken on each buffer as the list is iterated to ensure each
buffer persists for the duration of the bpf program execution without
holding the list mutex.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250522230429.941193-3-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27 09:51:25 -07:00
Linus Torvalds b1456f6dc1 Updates for the time/timer core code:
- Rework the initialization of the posix-timer kmem_cache and move the
     cache pointer into the timer_data structure to prevent false sharing.
 
   - Switch the alarmtimer code to lock guards.
 
   - Improve the CPU selection criteria in the per CPU validation of the
     clocksource watchdog to avoid arbitrary selections (or omissions) on
     systems with a small number of CPUs.
 
   - The usual cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzgwkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofL9D/9aiPT3UNkEVJzzQMIjeghi2foKcyqW
 ut+0XH0+8bsFrzNKPs/PL9chIcblamm63FjwKmVxKaTFakP9omGUEMKAXIcy7L10
 UWoQ7kSLvN/+3RB4JwOavtkNtdxkcjhDso+pd1VP0t7BQ5EsFRg4zkGHx1+PO/8C
 H1URzpfmYLZWBPvIHfvgwFy5PAwwppehDynbxrR8uatg8kLvXUUGQRu/yrOYrqx8
 7a/4jFkh75QdsezYOrS6yMjCS0qEeg6l37AW1WLQplZqHxJ4Mmwx9aL890KTQkXO
 MZhtcZ1Iqa/7KdDNw1yzaW9T9t5RzND5IwEbBrLVBoeQft+P/Y3Grax5pHh+Gt8u
 Sj4+4OiyhxQbhOcKGKjTr5pnHc//+jlm1QyLd3Ri6GL+mZB0JTnfmsFDRkhcWORN
 05NcPxfganbJdENStYXZYuEIMXKnp1si8JaEbyI0AfgN8hpWITbVLnMZ65ngdOZ9
 ym2HY2+3V5uCPPxegV2HdYX8VfaIRekiJAJ+Ttt3shG3+QflUNVSL4C6tv+lNNqa
 PPV543ojVmlYWximdhc9KhT12yevhUWiuoFK50lndgbL9vDs471Ua/KXqrf/um0h
 j8n49t8Ioq1Ht7g+olN0P3L+w0qSPM2oMWFnHh6bKTQniBrnkRDOtxQC6FC1pwEI
 JVnpRMAAuSmjVQ==
 =0+bf
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer core updates from Thomas Gleixner:
 "Updates for the time/timer core code:

   - Rework the initialization of the posix-timer kmem_cache and move
     the cache pointer into the timer_data structure to prevent false
     sharing

   - Switch the alarmtimer code to lock guards

   - Improve the CPU selection criteria in the per CPU validation of the
     clocksource watchdog to avoid arbitrary selections (or omissions)
     on systems with a small number of CPUs

   - The usual cleanups and improvements"

* tag 'timers-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/nohz: Remove unused tick_nohz_full_add_cpus_to()
  clocksource: Fix the CPUs' choice in the watchdog per CPU verification
  alarmtimer: Switch spin_{lock,unlock}_irqsave() to guards
  alarmtimer: Remove dead return value in clock2alarm()
  time/jiffies: Change register_refined_jiffies() to void __init
  timers: Remove unused __round_jiffies(_up)
  posix-timers: Initialize cache early and move pointer into __timer_data
2025-05-27 09:04:15 -07:00
Linus Torvalds 5e8bbb2caa Another set of timer API cleanups:
- Convert init_timer*(), try_to_del_timer_sync() and
    destroy_timer_on_stack() over to the canonical timer_*() namespace
    convention.
 
 There are is another large converstion pending, which has not been included
 because it would have caused a gazillion of merge conflicts in next. The
 conversion scripts will be run towards the end of the merge window and a
 pull request sent once all conflict dependencies have been merged.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzgTkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodwVD/97rF1Juqm1JZNIZPN/vMqwCxRoUkc6
 tsK0+UC7UXusuJadxJ+Bsv25iPF+qejnThMU+SQ5yTVj/PNfxOe0WPdCEGGiL8Ye
 2JCk6GqSOB/360SlLmtR1B1xHDwsuuUcQTz0w57CH66HRV5vpoWSMSwj/ypy+8nU
 PlgjItaxdCKa9NJ+SUJZPWIxRkt/PsA1kwlV1OcxkgB++IiIHQEbPxECq9mlzWXF
 b4Sq/Sdf2OmEePN+DYoey4fneRwJnkjkeX/o+CqosCPHRIiWUlSu5W/lU5IYojM3
 s3XpMNNg/z8PMXR4JA2VaPYWLUZyBOs+3dM7Y6Am+z55EoxMxfzg6pGx2tfM4ftl
 vF8wG3Z1c9MmpLk+P9LatNvfHeVLNve8KgOLa5phMDQ/El/a8KqLu6HmRDPONvKp
 d6iXdPq1CP8P6jOtlFfzLmKPShgEcp+Zz9W3CaQR/0ZJEsEqrpKOLzdT86hJhBV0
 mBCdzixmGtKAh0BdPdmg2FCLScqER3HKIJhZSdV8I+jSETIHCuMiIfbMXR7iwm/H
 R1/ayvxrbc1mPseo28scqvo7m6cn5BFBxIUf4Sokp52ZCapz1v2aWzo4vHI0cTgT
 ZOjlTrf+fgYLn1dqdD45TJiQPnmRrw4dU+WWSFRFJY2qjfyucj80vdqdkE5zkp5b
 UPomlVimG4ccPg==
 =FHGU
 -----END PGP SIGNATURE-----

Merge tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer cleanups from Thomas Gleixner:
 "Another set of timer API cleanups:

    - Convert init_timer*(), try_to_del_timer_sync() and
      destroy_timer_on_stack() over to the canonical timer_*()
      namespace convention.

  There is another large conversion pending, which has not been included
  because it would have caused a gazillion of merge conflicts in next.
  The conversion scripts will be run towards the end of the merge window
  and a pull request sent once all conflict dependencies have been
  merged"

* tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack()
  treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try()
  timers: Rename init_timers() as timers_init()
  timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA
  timers: Rename __init_timer_on_stack() as __timer_init_on_stack()
  timers: Rename __init_timer() as __timer_init()
  timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack()
  timers: Rename init_timer_key() as timer_init_key()
2025-05-27 08:31:21 -07:00
Linus Torvalds 44ed0f35df Updates for the MSI subsystem (core code and PCI):
- Switch the MSI decriptor locking to lock guards
 
   - Replace a broken and naive implementation of PCI/MSI-X control word
     updates in the PCI/TPH driver with a properly serialized variant in the
     PCI/MSI core code.
 
   - Remove the MSI descriptor abuse in the SCCI/UFS/QCOM driver by
     replacing the direct access to the MSI descriptors with the proper API
     function calls. People will never understand that APIs exist for a
     reason...
 
   - Provide core infrastructre for the upcoming PCI endpoint library
     extensions. Currently limited to ARM GICv3+, but in theory extensible
     to other architectures.
 
   - Provide a MSI domain::teardown() callback, which allows drivers to undo
     the effects of the prepare() callback.
 
   - Move the MSI domain::prepare() callback invocation to domain creation
     time to avoid redundant (and in case of ARM/GIC-V3-ITS confusing)
     invocations on every allocation.
 
     In combination with the new teardown callback this removes some ugly
     hacks in the GIC-V3-ITS driver, which pretended to work around the
     short comings of the core code so far. With this update the code is
     correct by design and implementation.
 
   - Make the irqchip MSI library globally available, provide a MSI parent
     domain creation helper and convert a bunch of (PCI/)MSI drivers over to
     the modern MSI parent mechanism. This is the first step to get rid of
     at least one incarnation of the three PCI/MSI management schemes.
 
   - The usual small cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzgFsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoR0KD/402K12tlI/D70H2aTG25dbTx+dkVk+
 pKpJz0985uUlLJiPCR54dZL0ofcfRU+CdjEIf1I+6TPshtg6IWLJCfqu7OWVPYzz
 2lJDO0yeUGwJqc0CIa1vttvJWvcUcxfWBX/ZSkOIM5avaXqSwRwsFNfd7TQ+T+eG
 79VS1yyW197mUva53ekSF2voa8EEPWfEslAjoX1dRg5d4viAxaLtKm/KpBqo1oPh
 Eb+E67xEWiIonvWNdr1AOisxnbi19PyDo1xnftgBToaeXXYBodNrNIAfAkx40YUZ
 IZQLHvhZ91x15hXYIS4Cz1RXqPECbu/tHxs4AFUgGvqdgJUF89wzI3C21ymrKA6E
 tDlWfpIcuE3vV/bsqj1gHGL5G5m1tyBRgIdIAOOmMoTHvwp5rrQtuZzpuqzGmEzj
 iVIHnn5m08kRpOZQc7+PlxQMh3eunEyj9WWG49EJgoAnJPb5lou4shTwBUheHcKm
 NXxKsfo4x5C+WehGTxv80UlnMcK3Yh/TuWf2OPR6QuT2iHP2VL5jyHjIs0ICn0cp
 1tvSJtdc1rgvk/4Vn4lu5eyVaTx5ZAH8ZXNQfwwBTWTp3ZyAW+7GkaCq3LPaNJoZ
 4LWpgZ5gs6wT+1XNT3boKdns81VolmeTI8P1ciQKpUtaTt6Cy9P/i2az/J+BCS4U
 Fn5Qqk08PHGrUQ==
 =OBMj
 -----END PGP SIGNATURE-----

Merge tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull MSI updates from Thomas Gleixner:
 "Updates for the MSI subsystem (core code and PCI):

   - Switch the MSI descriptor locking to lock guards

   - Replace a broken and naive implementation of PCI/MSI-X control word
     updates in the PCI/TPH driver with a properly serialized variant in
     the PCI/MSI core code.

   - Remove the MSI descriptor abuse in the SCCI/UFS/QCOM driver by
     replacing the direct access to the MSI descriptors with the proper
     API function calls. People will never understand that APIs exist
     for a reason...

   - Provide core infrastructre for the upcoming PCI endpoint library
     extensions. Currently limited to ARM GICv3+, but in theory
     extensible to other architectures.

   - Provide a MSI domain::teardown() callback, which allows drivers to
     undo the effects of the prepare() callback.

   - Move the MSI domain::prepare() callback invocation to domain
     creation time to avoid redundant (and in case of ARM/GIC-V3-ITS
     confusing) invocations on every allocation.

     In combination with the new teardown callback this removes some
     ugly hacks in the GIC-V3-ITS driver, which pretended to work around
     the short comings of the core code so far. With this update the
     code is correct by design and implementation.

   - Make the irqchip MSI library globally available, provide a MSI
     parent domain creation helper and convert a bunch of (PCI/)MSI
     drivers over to the modern MSI parent mechanism. This is the first
     step to get rid of at least one incarnation of the three PCI/MSI
     management schemes.

   - The usual small cleanups and improvements"

* tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  PCI/MSI: Use bool for MSI enable state tracking
  PCI: tegra: Convert to MSI parent infrastructure
  PCI: xgene: Convert to MSI parent infrastructure
  PCI: apple: Convert to MSI parent infrastructure
  irqchip/msi-lib: Honour the MSI_FLAG_NO_AFFINITY flag
  irqchip/mvebu: Convert to msi_create_parent_irq_domain() helper
  irqchip/gic: Convert to msi_create_parent_irq_domain() helper
  genirq/msi: Add helper for creating MSI-parent irq domains
  irqchip: Make irq-msi-lib.h globally available
  irqchip/gic-v3-its: Use allocation size from the prepare call
  genirq/msi: Engage the .msi_teardown() callback on domain removal
  genirq/msi: Move prepare() call to per-device allocation
  irqchip/gic-v3-its: Implement .msi_teardown() callback
  genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare()
  irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask
  dt-bindings: PCI: pci-ep: Add support for iommu-map and msi-map
  irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS
  irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable()
  platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
  genirq/msi: Rename msi_[un]lock_descs()
  ...
2025-05-27 08:15:26 -07:00
Linus Torvalds 2bd1bea5fa A set of cleanups for the generic interrupt subsystem:
- Consolidate on one set of functions for the interrupt domain code to
     get rid of pointlessly duplicated code with only marginal different
     semantics.
 
   - Update the documentation accordingly and consolidate the coding style
     of the irqdomain header.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzd+MTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodTRD/0RmG5tngCbEJmTw6lPDQzRZH4OO3ja
 yRYlyBipemoRmvJRGjV4uHqN2QPrdOuoqMuyBO1aWcMdkpww5bAHcbgSFrlGM1lW
 kqtaxVMbufPiLQSGYe7OQf478CE1ykoBd5Va8whFKrtA73qEUdEMfWT0stspg780
 7BlmQOemL91p7Ytf03FbDdo8tZ5Xu9uXGAulwY9FZsFtsCNyvhl7nOv5Sk8ZQtGO
 xHRCeunjZLWR+IaK59hdakvQybXwSnjT6jODp96nlyKABEKSPShGSPFDWd3g9px7
 4911QwgnvTbcrsk6YmQEmPIOgXZzypjbnjpJr8tFpTbkVIy+6chi5cBJzXoqsUaM
 ylTwFcUQNvcP8yF447qb+nyPFKM5xsC07W0UpZMuJUDmhhPRtDm5pK0jpsif96GP
 l4aMsWe65PUmXHQqLdE89RJXAa8XQ2qspKVtNKq9DmEVgTviQ09Z9SSQIx4U0yIx
 w+YPde8kH2+O+YtMUn/MmfHhUP4MKya7j5zd8Bnv8wLBi7XGPPA5EKKh9I0dz9m+
 X94lweNXyH+Q8U9mt2cQf8VG8Yzgk0eeC0sliJIlybwRgEgRcQbVWw0VvZUA1ySa
 VBlaj3SinO90FEQ0CctT51ss2mUJ/XsGCnxpiGZXfqIZzFbyD1YfZQnXJH0H67DI
 CqdHw22I27Mu/A==
 =9nLp
 -----END PGP SIGNATURE-----

Merge tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq cleanups from Thomas Gleixner:
 "A set of cleanups for the generic interrupt subsystem:

   - Consolidate on one set of functions for the interrupt domain code
     to get rid of pointlessly duplicated code with only marginal
     different semantics.

   - Update the documentation accordingly and consolidate the coding
     style of the irqdomain header"

* tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  irqdomain: Consolidate coding style
  irqdomain: Fix kernel-doc and add it to Documentation
  Documentation: irqdomain: Update it
  Documentation: irq-domain.rst: Simple improvements
  Documentation: irq/concepts: Minor improvements
  Documentation: irq/concepts: Add commas and reflow
  irqdomain: Improve kernel-docs of functions
  irqdomain: Make struct irq_domain_info variables const
  irqdomain: Use irq_domain_instantiate()'s return value as initializers
  irqdomain: Drop irq_linear_revmap()
  pinctrl: keembay: Switch to irq_find_mapping()
  irqchip/armada-370-xp: Switch to irq_find_mapping()
  gpu: ipu-v3: Switch to irq_find_mapping()
  gpio: idt3243x: Switch to irq_find_mapping()
  sh: Switch to irq_find_mapping()
  powerpc: Switch to irq_find_mapping()
  irqdomain: Drop irq_domain_add_*() functions
  powerpc: Switch irq_domain_add_nomap() to use fwnode
  thermal: Switch to irq_domain_create_linear()
  soc: Switch to irq_domain_create_*()
  ...
2025-05-27 08:07:32 -07:00
Linus Torvalds c0f182c979 Update for interrupt chip drivers:
- Convert the generic interrupt chip to lock guards to remove copy &
     pasta boilerplate code and gotos.
 
   - A new driver fot the interrupt controller in the EcoNet EN751221 MIPS SoC.
 
   - Extend the SG2042-MSI driver to support the new SG2044 SoC
 
   - Updates and cleanups for the (ancient) VT8500 driver
 
   - Improve the scalability of the ARM GICV4.1 ITS driver by utilizing node
     local copies a VM's interrupt translation table when possible. This
     results in a 12% reduction of VM IPI latency in certain workloads.
 
   - The usual cleanups and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzfSwTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoc4lD/0U24B8okpp2PxVVZOtNzWgl7kcAQSJ
 2U834ep1DhqJPNW0JjT+5Lb55NfAEN/uCuirjLZDsKYNNel4LXhAY951BCJMytYX
 ebH/J7wGjEphRogxn9QTGGC/mguThwFnOiqOLq4aU0Sq/oRH6Uj+P6hMod7ym9bn
 P+bZv9WWhLQQ3x/RimcauReCEDW6pW2soQV+zhN+xTxTW+R1zRcksz1x4+b/B7Vk
 ZH6KFBpZJyC34T0aXOJFhrEo01z2iZWifgmX1zz2ZgZjeUklFxtW9vGqBRS0mU2P
 9bW/qXDsSdOStyfuXbG7Q3s2z9s5Voj9okgBiA5DUD3DuplVHG/3x8do8ZHrvMoV
 k59ORecx29g0nBaVMjT13gH1XfaqI3W52qff6yksqqByh+5urhGXeYzvQ07M9ldm
 eUA8NxNad+6Gir6AcMN+COA+W8oOP17gvoSuFlUhdM/MZvPP0Gb8GkNk3o2Kfil/
 JjvcHJHCAZv6x1L7jhFhAmTUvR9ibmMJDmXJM2tIHvS1HrHNfKAIyxy00GAVg7TN
 f5Iv0+vqB7C6PHzMYIIQpZ3hrJL2GR6jdToPdAWIfr5BzugglDIRUlhEIsxhSXQn
 WMmoif5bKS8wxQRyP2F3FPv+eKYT2XVlVri3LHBkqKbkJW/sqJWHHFGIdaDrwVhX
 vZlmkT07PD3jbQ==
 =OS2H
 -----END PGP SIGNATURE-----

Merge tag 'irq-drivers-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq controller updates from Thomas Gleixner:
 "Update for interrupt chip drivers:

   - Convert the generic interrupt chip to lock guards to remove copy &
     pasta boilerplate code and gotos.

   - A new driver fot the interrupt controller in the EcoNet EN751221
     MIPS SoC.

   - Extend the SG2042-MSI driver to support the new SG2044 SoC

   - Updates and cleanups for the (ancient) VT8500 driver

   - Improve the scalability of the ARM GICV4.1 ITS driver by utilizing
     node local copies a VM's interrupt translation table when possible.
     This results in a 12% reduction of VM IPI latency in certain
     workloads.

   - The usual cleanups and improvements all over the place"

* tag 'irq-drivers-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  irqchip/irq-pruss-intc: Simplify chained interrupt handler setup
  irqchip/gic-v4.1: Use local 4_1 ITS to generate VSGI
  irqchip/econet-en751221: Switch to of_fwnode_handle()
  irqchip/irq-vt8500: Switch to irq_domain_create_*()
  irqchip/econet-en751221: Switch to irq_domain_create_linear()
  irqchip/irq-vt8500: Use fewer global variables and add error handling
  irqchip/irq-vt8500: Use a dedicated chained handler function
  irqchip/irq-vt8500: Don't require 8 interrupts from a chained controller
  irqchip/irq-vt8500: Drop redundant copy of the device node pointer
  irqchip/irq-vt8500: Split up ack/mask functions
  irqchip/sg2042-msi: Fix wrong type cast in sg2044_msi_irq_ack()
  irqchip/sg2042-msi: Add the Sophgo SG2044 MSI interrupt controller
  irqchip/sg2042-msi: Introduce configurable chipinfo for SG2042
  irqchip/sg2042-msi: Rename functions and data structures to be SG2042 agnostic
  dt-bindings: interrupt-controller: Add Sophgo SG2044 MSI controller
  genirq/generic-chip: Fix incorrect lock guard conversions
  genirq/generic-chip: Remove unused lock wrappers
  irqchip: Convert generic irqchip locking to guards
  gpio: mvebu: Convert generic irqchip locking to guard()
  ARM: orion/gpio:: Convert generic irqchip locking to guard()
  ...
2025-05-27 08:00:46 -07:00
Linus Torvalds 60c1d948f7 Updates for the generic interrupt subsystem core code:
- Address a long standing subtle problem in the CPU hotplug code for
     affinity-managed interrupts.
 
     Affinity-managed interrupts are shut down by the core code when the
     last CPU in the affinity set goes offline and started up again when the
     first CPU in the affinity set becomes online again. This unfortunately
     does not take into account whether an interrupt has been disabled
     before the last CPU goes offline and starts up the interrupt
     unconditionally when the first CPU becomes online again. That's
     obviously not what drivers expect.
 
     Address this by preserving the disabled state for affinity-managed
     interrupts accross these CPU hotplug operations. All non-managed
     interrupts are not affected by this because startup/shutdown is coupled
     to request/free_irq() which obviously has to reset state.
 
   - Support three-cell scheme interrupts to allow GPIO drivers to specify
     interrupts from an already existing scheme
 
   - Switch the interrupt subsystem core to lock guards. This gets rid of
     quite some copy & pasta boilerplate code all over the place.
 
   - The usual small cleanups and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzesoTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYunD/44Z1kB7HGO97GC+Xyfit06ZDBBO794
 CLoo6cpZ2cU7f7YuBnKVjXcW6S9WUcWwnLg0lhxoIBsKDzTTwRPX8UM5Ro4G0YvZ
 ovCQGHKcps8c3ord6dZnyCV/9Ktzr30g5PCzFQkSLKM38DTKgOTH8pQnKYox0XlL
 VGa5ExlWxrOF40GEFXWsJLIyYo2B3LzENZaWihT+6mtW6+ry1ZamW9g/1sron8ad
 cd6UEgQvmNKKscaIqOW4hgDGr4F99oPzyyRGBd+uyqzpeOEH1wGbN5EFu9Mfuy+I
 QZDusm3muIovAhRRhSR7XNQOv13D/RDkjws9sGWDWAVlnFnQTpov7f6cm7ofCkvF
 H898oup43DA97+FJqWo4HlG/37T3gRaQfP0BED0u3vZQ0SWmgTO3eAS/J1OPxvO/
 lWcoPIpPIvCWJYC7XBV1vQi47Kb+gTUNtVID5p8e/6hsE2H2TtOQ6T2rWbVZrkf4
 0QzNCn7V0HaBLvv2ztJ/A3HlwMqmz7DGO+Q0nGG92SmsccuMt0MRP8zv2cbrGhp5
 bzYUD1ZuJV9GQ4XgQ3RG+8dVur6Nu84enjIOewzyyBnOIg3bRLNuJ8RdmgIotiCF
 Q+Si8iXrON0wAPqO36Z0n3Xg05okPZkTGyemgKl60n8lmK8dRgwkH1PxngpfK0Vw
 NKg1wG8Kfc1hRg==
 =xHtm
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq core updates from Thomas Gleixner:
 "Updates for the generic interrupt subsystem core code:

   - Address a long standing subtle problem in the CPU hotplug code for
     affinity-managed interrupts.

     Affinity-managed interrupts are shut down by the core code when the
     last CPU in the affinity set goes offline and started up again when
     the first CPU in the affinity set becomes online again.

     This unfortunately does not take into account whether an interrupt
     has been disabled before the last CPU goes offline and starts up
     the interrupt unconditionally when the first CPU becomes online
     again.

     That's obviously not what drivers expect.

     Address this by preserving the disabled state for affinity-managed
     interrupts accross these CPU hotplug operations. All non-managed
     interrupts are not affected by this because startup/shutdown is
     coupled to request/free_irq() which obviously has to reset state.

   - Support three-cell scheme interrupts to allow GPIO drivers to
     specify interrupts from an already existing scheme

   - Switch the interrupt subsystem core to lock guards. This gets rid
     of quite some copy & pasta boilerplate code all over the place.

   - The usual small cleanups and improvements all over the place"

* tag 'irq-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
  genirq/irqdesc: Remove double locking in hwirq_show()
  genirq: Retain disable depth for managed interrupts across CPU hotplug
  genirq: Bump the size of the local variable for sprintf()
  genirq/manage: Use the correct lock guard in irq_set_irq_wake()
  genirq: Consistently use '%u' format specifier for unsigned int variables
  genirq: Ensure flags in lock guard is consistently initialized
  genirq: Fix inverted condition in handle_nested_irq()
  genirq/cpuhotplug: Fix up lock guards conversion brainf..t
  genirq: Use scoped_guard() to shut clang up
  genirq: Remove unused remove_percpu_irq()
  genirq: Remove irq_[get|put]_desc*()
  genirq/manage: Rework irq_set_irqchip_state()
  genirq/manage: Rework irq_get_irqchip_state()
  genirq/manage: Rework teardown_percpu_nmi()
  genirq/manage: Rework prepare_percpu_nmi()
  genirq/manage: Rework disable_percpu_irq()
  genirq/manage: Rework irq_percpu_is_enabled()
  genirq/manage: Rework enable_percpu_irq()
  genirq/manage: Rework irq_set_parent()
  genirq/manage: Rework can_request_irq()
  ...
2025-05-27 07:46:58 -07:00
Linus Torvalds 0c1494015f Updates for the generic and architecture entry code:
- Move LoongArch and RISC-V ret_from_fork() implementations to C code so
     that syscall_exit_user_mode() can be inlined.
 
   - Split the RISC-V ret_from_fork() implementation into return to user and
     return to kernel, which gives a measurable performance improvement.
 
   - Inline syscall_exit_user_mode() which benefits all architectures by
     avoiding a function call and letting the compiler do better
     optimizations.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmgzdscTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoey4D/9VAZsXsPpYkeR+mBtfy5rJFQtDSbT5
 wBYOJcrQOiekfyHXTn+YyY3EtIKyqzK98Bm48f1C3DgfLU1S3J5hK/YH3HmRHGc+
 50WSy0q2t2OgdFObxAq56paSYIBW10KKVqyXPO/mQ0oLgECf1nai8NgV64aU1ET7
 jPQHGNZuZLKm8jKl5OcFFXWSFyGO9SPBfae5FEGH/0e7LPv62DP0ph1bQ1PLmHCb
 8QKWJV56zxYWDUP4Kjojy62RcG+hBeraNMqnxtzKmauBhUyX21MJdKI3OQwbfu2U
 r3qQG2Y/BKOWs6jSb7yvOO+NKWAGIPD7iMMxtJs0vJzjRMDE9pkkfyPFvzQfcqGn
 gLo6Dp5VxSLfGYoNFvrrQcojrcpvInRUidlZZBykogHb07RCfeXBMkvCxuAuPaDh
 MoH+NeTFCi2oTkc2VHlpBC1+RCAcQ8cz1CqxXDDOXazSRqVrnLnflqLnP0Ldxzcn
 jyGv+1/iP/Fz1w3HtEdIeHrHPY7SgqR4RkOkT11KVGYc2h1PpbHUws2PAxjst9gB
 C3iNnR+izFzg/wjQZ7opHvJvXTJRgEAgyWly3GJorT927G8VA2SiAdzOAsRdCnBG
 g7gEZEQ48MtOr7v5YaviAerAikkJWgLOU+X5pZsrha+DSme8mn5iwhsposJpFsJy
 VHEmKrt5vpxrpg==
 =sbxa
 -----END PGP SIGNATURE-----

Merge tag 'core-entry-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core entry code updates from Thomas Gleixner:
 "Updates for the generic and architecture entry code:

   - Move LoongArch and RISC-V ret_from_fork() implementations to C code
     so that syscall_exit_user_mode() can be inlined

   - Split the RISC-V ret_from_fork() implementation into return to user
     and return to kernel, which gives a measurable performance
     improvement

   - Inline syscall_exit_user_mode() which benefits all architectures by
     avoiding a function call and letting the compiler do better
     optimizations"

* tag 'core-entry-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  LoongArch: entry: Fix include order
  entry: Inline syscall_exit_to_user_mode()
  LoongArch: entry: Migrate ret_from_fork() to C
  riscv: entry: Split ret_from_fork() into user and kernel
  riscv: entry: Convert ret_from_fork() to C
2025-05-27 07:44:22 -07:00
Linus Torvalds ddddf9d64f Performance events updates for v6.16:
Core & generic-arch updates:
 
  - Add support for dynamic constraints and propagate it to
    the Intel driver (Kan Liang)
 
  - Fix & enhance driver-specific throttling support (Kan Liang)
 
  - Record sample last_period before updating on the
    x86 and PowerPC platforms (Mark Barnett)
 
  - Make perf_pmu_unregister() usable (Peter Zijlstra)
 
  - Unify perf_event_free_task() / perf_event_exit_task_context()
    (Peter Zijlstra)
 
  - Simplify perf_event_release_kernel() and perf_event_free_task()
    (Peter Zijlstra)
 
  - Allocate non-contiguous AUX pages by default (Yabin Cui)
 
 Uprobes updates:
 
  - Add support to emulate NOP instructions (Jiri Olsa)
 
  - selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa)
 
 x86 Intel PMU enhancements:
 
  - Support Intel Auto Counter Reload [ACR] (Kan Liang)
 
  - Add PMU support for Clearwater Forest (Dapeng Mi)
 
  - Arch-PEBS preparatory changes: (Dapeng Mi)
 
    - Parse CPUID archPerfmonExt leaves for non-hybrid CPUs
    - Decouple BTS initialization from PEBS initialization
    - Introduce pairs of PEBS static calls
 
 x86 AMD PMU enhancements:
 
  - Use hrtimer for handling overflows in the AMD uncore driver
    (Sandipan Das)
 
  - Prevent UMC counters from saturating (Sandipan Das)
 
 Fixes and cleanups:
 
  - Fix put_ctx() ordering (Frederic Weisbecker)
 
  - Fix irq work dereferencing garbage (Frederic Weisbecker)
 
  - Misc fixes and cleanups (Changbin Du, Frederic Weisbecker,
    Ian Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang,
    Sandipan Das, Thorsten Blum)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy4zoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j6QRAAvQ4GBPrdJLb8oXkLjCmWSp9PfM1h2IW0
 reUrcV0BPRAwz4T60QEU2KyiEjvKxNghR6bNw4i3slAZ8EFwP9eWE/0ZYOo5+W/N
 wv8vsopv/oZd2L2G5TgxDJf+tLPkqnTvp651LmGAbquPFONN1lsya9UHVPnt2qtv
 fvFhjW6D828VoevRcUCsdoEUNlFDkUYQ2c3M1y5H2AI6ILDVxLsp5uYtuVUP+2lQ
 7UI/elqRIIblTGT7G9LvTGiXZMm8T58fe1OOLekT6NdweJ3XEt1kMdFo/SCRYfzU
 eDVVVLSextZfzBXNPtAEAlM3aSgd8+4m5sACiD1EeOUNjo5J9Sj1OOCa+bZGF/Rl
 XNv5Kcp6Kh1T4N5lio8DE/NabmHDqDMbUGfud+VTS8uLLku4kuOWNMxJTD1nQ2Zz
 BMfJhP89G9Vk07F9fOGuG1N6mKhIKNOgXh0S92tB7XDHcdJegueu2xh4ZszBL1QK
 JVXa4DbnDj+y0LvnV+A5Z6VILr5RiCAipDb9ascByPja6BbN10Nf9Aj4nWwRTwbO
 ut5OK/fDKmSjEHn1+a42d4iRxdIXIWhXCyxEhH+hJXEFx9htbQ3oAbXAEedeJTlT
 g9QYGAjL96QEd0CqviorV8KyU59nVkEPoLVCumXBZ0WWhNwU6GdAmsW1hLfxQdLN
 sp+XHhfxf8M=
 =tPRs
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:
 "Core & generic-arch updates:

   - Add support for dynamic constraints and propagate it to the Intel
     driver (Kan Liang)

   - Fix & enhance driver-specific throttling support (Kan Liang)

   - Record sample last_period before updating on the x86 and PowerPC
     platforms (Mark Barnett)

   - Make perf_pmu_unregister() usable (Peter Zijlstra)

   - Unify perf_event_free_task() / perf_event_exit_task_context()
     (Peter Zijlstra)

   - Simplify perf_event_release_kernel() and perf_event_free_task()
     (Peter Zijlstra)

   - Allocate non-contiguous AUX pages by default (Yabin Cui)

  Uprobes updates:

   - Add support to emulate NOP instructions (Jiri Olsa)

   - selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa)

  x86 Intel PMU enhancements:

   - Support Intel Auto Counter Reload [ACR] (Kan Liang)

   - Add PMU support for Clearwater Forest (Dapeng Mi)

   - Arch-PEBS preparatory changes: (Dapeng Mi)
       - Parse CPUID archPerfmonExt leaves for non-hybrid CPUs
       - Decouple BTS initialization from PEBS initialization
       - Introduce pairs of PEBS static calls

  x86 AMD PMU enhancements:

   - Use hrtimer for handling overflows in the AMD uncore driver
     (Sandipan Das)

   - Prevent UMC counters from saturating (Sandipan Das)

  Fixes and cleanups:

   - Fix put_ctx() ordering (Frederic Weisbecker)

   - Fix irq work dereferencing garbage (Frederic Weisbecker)

   - Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian
     Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan
     Das, Thorsten Blum)"

* tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  perf/headers: Clean up <linux/perf_event.h> a bit
  perf/uapi: Clean up <uapi/linux/perf_event.h> a bit
  perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h>
  mips/perf: Remove driver-specific throttle support
  xtensa/perf: Remove driver-specific throttle support
  sparc/perf: Remove driver-specific throttle support
  loongarch/perf: Remove driver-specific throttle support
  csky/perf: Remove driver-specific throttle support
  arc/perf: Remove driver-specific throttle support
  alpha/perf: Remove driver-specific throttle support
  perf/apple_m1: Remove driver-specific throttle support
  perf/arm: Remove driver-specific throttle support
  s390/perf: Remove driver-specific throttle support
  powerpc/perf: Remove driver-specific throttle support
  perf/x86/zhaoxin: Remove driver-specific throttle support
  perf/x86/amd: Remove driver-specific throttle support
  perf/x86/intel: Remove driver-specific throttle support
  perf: Only dump the throttle log for the leader
  perf: Fix the throttle logic for a group
  perf/core: Add the is_event_in_freq_mode() helper to simplify the code
  ...
2025-05-26 15:40:23 -07:00
Linus Torvalds eaed94d1f6 Scheduler updates for v6.16:
Core & fair scheduler changes:
 
   - Tweak wait_task_inactive() to force dequeue sched_delayed tasks
     (John Stultz)
 
   - Adhere to place_entity() constraints (Peter Zijlstra)
 
   - Allow decaying util_est when util_avg > CPU capacity (Pierre Gondois)
 
   - Fix up wake_up_sync() vs DELAYED_DEQUEUE (Xuewen Yan)
 
 Energy management:
 
   - Introduce sched_update_asym_prefer_cpu() (K Prateek Nayak)
 
   - cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change
     (K Prateek Nayak)
 
   - Align uclamp and util_est and call before freq update (Xuewen Yan)
 
 CPU isolation:
 
   - Make use of more than one housekeeping CPU (Phil Auld)
 
 RT scheduler:
 
   - Fix race in push_rt_task() (Harshit Agarwal)
 
   - Add kernel cmdline option for rt_group_sched (Michal Koutný)
 
 Scheduler topology support:
 
   - Improve topology_span_sane speed (Steve Wahl)
 
 Scheduler debugging:
 
   - Move and extend the sched_process_exit() tracepoint (Andrii Nakryiko)
 
   - Add RT_GROUP WARN checks for non-root task_groups (Michal Koutný)
 
   - Fix trace_sched_switch(.prev_state) (Peter Zijlstra)
 
   - Untangle cond_resched() and live-patching (Peter Zijlstra)
 
 Fixes and cleanups:
 
   - Misc fixes and cleanups (K Prateek Nayak, Michal Koutný,
     Peter Zijlstra, Xuewen Yan)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy50ARHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jFQQ/+KXl2XDg1V/VVmMG8GmtDlR29V3M3ricy
 D7/2s0D1Y1ErHb+pRMBG31EubT9/bXjUshWIuuf51DciSLBmpELHxY5J+AevRa0L
 /pHFwSvP6H5pDakI/xZ01FlYt7PxZGs+1m1o2615Mbwq6J2bjZTan54CYzrdpLOy
 Nqb3OT4tSqU1+7SV7hVForBpZp9u3CvVBRt/wE6vcHltW/I486bM8OCOd2XrUlnb
 QoIRliGI9KHpqCpbAeKPRSKXpf9tZv/AijZ+0WUu2yY8iwSN4p3RbbbwdCipjVQj
 w5I5oqKI6cylFfl2dEFWXVO+tLBihs06w8KSQrhYmQ9DUu4RGBVM9ORINGDBPejL
 bvoQh1mAkqvIL+oodujdbMDIqLupvOEtVSvwzR7SJn8BJSB00js88ngCWLjo/CcU
 imLbWy9FSBLvOswLBzQthgAJEj+ejCkOIbcvM2lINWhX/zNsMFaaqYcO1wRunGGR
 SavTI1s+ZksCQY6vCwRkwPrOZjyg91TA/q4FK102fHL1IcthH6xubE4yi4lTIUYs
 L56HuGm8e7Shc8M2Y5rAYsVG3GoIHFLXnptOn2HnCRWaAAJYsBaLUlzoBy9MxCfw
 I2YVDCylkQxevosSi2XxXo3tbM6auISU9SelAT/dAz32V1rsjWQojRJXeGYKIbu7
 KBuN/dLItW0=
 =s/ra
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Core & fair scheduler changes:

   - Tweak wait_task_inactive() to force dequeue sched_delayed tasks
     (John Stultz)

   - Adhere to place_entity() constraints (Peter Zijlstra)

   - Allow decaying util_est when util_avg > CPU capacity (Pierre
     Gondois)

   - Fix up wake_up_sync() vs DELAYED_DEQUEUE (Xuewen Yan)

  Energy management:

   - Introduce sched_update_asym_prefer_cpu() (K Prateek Nayak)

   - cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings
     change (K Prateek Nayak)

   - Align uclamp and util_est and call before freq update (Xuewen Yan)

  CPU isolation:

   - Make use of more than one housekeeping CPU (Phil Auld)

  RT scheduler:

   - Fix race in push_rt_task() (Harshit Agarwal)

   - Add kernel cmdline option for rt_group_sched (Michal Koutný)

  Scheduler topology support:

   - Improve topology_span_sane speed (Steve Wahl)

  Scheduler debugging:

   - Move and extend the sched_process_exit() tracepoint (Andrii
     Nakryiko)

   - Add RT_GROUP WARN checks for non-root task_groups (Michal Koutný)

   - Fix trace_sched_switch(.prev_state) (Peter Zijlstra)

   - Untangle cond_resched() and live-patching (Peter Zijlstra)

  Fixes and cleanups:

   - Misc fixes and cleanups (K Prateek Nayak, Michal Koutný, Peter
     Zijlstra, Xuewen Yan)"

* tag 'sched-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  sched/uclamp: Align uclamp and util_est and call before freq update
  sched/util_est: Simplify condition for util_est_{en,de}queue()
  sched/fair: Fixup wake_up_sync() vs DELAYED_DEQUEUE
  sched,livepatch: Untangle cond_resched() and live-patching
  sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks
  sched/fair: Adhere to place_entity() constraints
  sched/debug: Print the local group's asym_prefer_cpu
  cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change
  sched/topology: Introduce sched_update_asym_prefer_cpu()
  sched/fair: Use READ_ONCE() to read sg->asym_prefer_cpu
  sched/isolation: Make use of more than one housekeeping cpu
  sched/rt: Fix race in push_rt_task
  sched: Add annotations to RT_GROUP_SCHED fields
  sched: Add RT_GROUP WARN checks for non-root task_groups
  sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled
  sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
  sched: Skip non-root task_groups with disabled RT_GROUP_SCHED
  sched: Add commadline option for RT_GROUP_SCHED toggling
  sched: Always initialize rt_rq's task_group
  sched: Remove unneeed macro wrap
  ...
2025-05-26 15:19:58 -07:00
Linus Torvalds b3570b00dc Locking changes for v6.16:
Futexes:
 
    - Add support for task local hash maps (Sebastian Andrzej Siewior,
      Peter Zijlstra)
 
    - Implement the FUTEX2_NUMA ABI, which feature extends the futex
      interface to be NUMA-aware. On NUMA-aware futexes a second u32
      word containing the NUMA node is added to after the u32 futex value
      word. (Peter Zijlstra)
 
    - Implement the FUTEX2_MPOL ABI, which feature extends the futex
      interface to be mempolicy-aware as well, to further refine futex
      node mappings and lookups. (Peter Zijlstra)
 
   Locking primitives:
 
    - Misc cleanups (Andy Shevchenko, Borislav Petkov, Colin Ian King,
                     Ingo Molnar, Nam Cao, Peter Zijlstra)
 
   Lockdep:
 
    - Prevent abuse of lockdep subclasses (Waiman Long)
    - Add number of dynamic keys to /proc/lockdep_stats (Waiman Long)
 
 Plus misc cleanups and fixes.
 
 Note that the tree includes the following dependent out-of-subsystem
 changes as well:
 
  - rcuref: Provide rcuref_is_dead()
  - mm: Add vmalloc_huge_node()
  - mm: Add the mmap_read_lock guard to <linux/mmap_lock.h>
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy3E8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1isNw/9FS6+ZReiV3NLHvhwIw8+6U2vV733wLY+
 mFzDk2CRwv2d6xg+QUrhLNI93i2fZnwNvK1f6LcRZMa1pNmwCcEghKgm0G+fRgbv
 skiGrlkUCoEqsDUxRW++/aTBcMo0vqG3NOObnUOrddG2W9tfrR8jq/EwlzB99dO7
 q8qaBNl9W1vLT3gh9/RPP5uKt0NKIf8ObvsyhWCGaywg81h2lC4AHf0Xlj3ZD95T
 TO5jhUhl/muhYtaqxeYPK0gDtCrgFz8NwZdjKx1nyP7Gbko6+L50AvOVXog0SIAU
 nncftvutGJg2ki7dbSYPDoHQrHO0JsF1vUfVZRjaKFebWpFo2yYdNMbITOeXVhSC
 QSpbH2qvyn21nT/YSj9dottHWBoNYBEgrcSf6DO4g0d8A0Jh7egXjQdA852RpeQ0
 LWGYx4rfiKhnjiXlKKQHrURZkcxxa40o+ls3RfFl2/kWA+7aUybvw6nAeDEkV0oL
 s2U0vZxsY37EPWDm40rTe9r4YpPqcB65i9YIesPzhtbcHJVmN0gts0o5l+x53GhR
 CeftFiiUi2nm6JaT+1wGvBDT3hQ8+NZ8GkPSeA6pEJWE3i4KquZlcBZLOSLZ3k/B
 df58zQi99Yun33is5f1kqDNspqvJOg/1nxUK68PgNSdCMKeuZkJYrcmh/rKNnXSC
 f7M1XHoWFb0=
 =La/x
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "Futexes:

   - Add support for task local hash maps (Sebastian Andrzej Siewior,
     Peter Zijlstra)

   - Implement the FUTEX2_NUMA ABI, which feature extends the futex
     interface to be NUMA-aware. On NUMA-aware futexes a second u32 word
     containing the NUMA node is added to after the u32 futex value word
     (Peter Zijlstra)

   - Implement the FUTEX2_MPOL ABI, which feature extends the futex
     interface to be mempolicy-aware as well, to further refine futex
     node mappings and lookups (Peter Zijlstra)

  Locking primitives:

   - Misc cleanups (Andy Shevchenko, Borislav Petkov, Colin Ian King,
     Ingo Molnar, Nam Cao, Peter Zijlstra)

  Lockdep:

   - Prevent abuse of lockdep subclasses (Waiman Long)

   - Add number of dynamic keys to /proc/lockdep_stats (Waiman Long)

  Plus misc cleanups and fixes"

* tag 'locking-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  selftests/futex: Fix spelling mistake "unitiliazed" -> "uninitialized"
  futex: Correct the kernedoc return value for futex_wait_setup().
  tools headers: Synchronize prctl.h ABI header
  futex: Use RCU_INIT_POINTER() in futex_mm_init().
  selftests/futex: Use TAP output in futex_numa_mpol
  selftests/futex: Use TAP output in futex_priv_hash
  futex: Fix kernel-doc comments
  futex: Relax the rcu_assign_pointer() assignment of mm->futex_phash in futex_mm_init()
  futex: Fix outdated comment in struct restart_block
  locking/lockdep: Add number of dynamic keys to /proc/lockdep_stats
  locking/lockdep: Prevent abuse of lockdep subclass
  locking/lockdep: Move hlock_equal() to the respective #ifdeffery
  futex,selftests: Add another FUTEX2_NUMA selftest
  selftests/futex: Add futex_numa_mpol
  selftests/futex: Add futex_priv_hash
  selftests/futex: Build without headers nonsense
  tools/perf: Allow to select the number of hash buckets
  tools headers: Synchronize prctl.h ABI header
  futex: Implement FUTEX2_MPOL
  futex: Implement FUTEX2_NUMA
  ...
2025-05-26 14:42:07 -07:00
Linus Torvalds 07046958f6 RCU pull request for v6.16
Summary of changes:
 - Removed swake_up_one_online() workaround
 - Reverted an incorrect rcuog wake-up fix from offline softirq
 - Rust RCU Guard methods marked as inline
 - Updated MAINTAINERS with Joel’s and Zqiang's new email address
 - Replaced magic constant in rcu_seq_done_exact() with named constant
 - Added warning mechanism to validate rcu_seq_done_exact()
 - Switched SRCU polling API to use rcu_seq_done_exact()
 - Commented on redundant delta check in rcu_seq_done_exact()
 - Made ->gpwrap tests in rcutorture more frequent
 - Fixed reuse of ARM64 images in rcutorture
 - rcutorture improved to check Kconfig and reader conflict handling
 - Extracted logic from rcu_torture_one_read() for clarity
 - Updated LWN RCU API documentation links
 - Enabled --do-rt in torture.sh for CONFIG_PREEMPT_RT
 - Added tests for SRCU up/down reader primitives
 - Added comments and delays checks in rcutorture
 - Deprecated srcu_read_lock_lite() and srcu_read_unlock_lite() via checkpatch
 - Added --do-normal and --do-no-normal to torture.sh
 - Added RCU Rust binding tests to torture.sh
 - Reduced CPU overcommit and removed MAXSMP/CPUMASK_OFFSTACK in TREE01
 - Replaced kmalloc() with kcalloc() in rcuscale
 - Refined listRCU example code for stale data elimination
 - Fixed hardirq count bug for x86 in cpu_stall_cputime
 - Added safety checks in rcu/nocb for offloaded rdp access
 - Other miscellaneous changes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEcoCIrlGe4gjE06JJqA4nf2o45hAFAmgoF5oACgkQqA4nf2o4
 5hDvVw//TNsJ/g0HTMu02uXMmtFIrgvpTnH7OEGJ+2p/KErrmWYsBJQw41ueLAQL
 Drtq3q9888UFF5LLA43HC88DFmT9uV8V8TmmURH+pZWdmJY1Ekn8UBSBhDPGGpC5
 sGIO2jJKjHN8G7fyJKoPtL9jxKSulHF/XQTIL2pP23jopAIwosoCHVAwGvnGVvBC
 smXfMSu+bd3IifNFroodsqjVXgnNQwWUNboOkz0KfkiiosgZsWWW8DaM3NGjdp+C
 tUHLs1zfC6sgJUjdpokTE3TcNudlMgVlB2Quj5jhh1YvsvedgIJXl4wpR6JVutyN
 F9awKt1AZkyZ+cTp+JpohaWaN9aKfNNG7jZ+rxQ0VcuRh35wmBJtiWNjEtJ38R82
 kTC1RI7MEus+6OZRt92jv5TNSa9t3wHbi5fBjNRiQ8PYq5cibZy7Lyrj2JOK7Zqs
 pgmdUnhQH2Uhf52b+clG5hWO55gEtACY8pin6kNewClcRtz04Jew7gkiYDGka4F4
 EXbuDHSWi25eSb3FzT2BqR72OZcJ0kv747OTp+2yTv2TaBA5p+OD8hvL/WbWC2Ok
 DK1YQ4RgEerTSZ4PbgPtWkNnlf6xjdWBaYNwmo+G/DgfjPoTOy1Jp73Z4b1AqSB5
 IPEQy1d/799QgGTYkbrvRtvWHg8yfOMz3ByZoHg31rafr0AsrXM=
 =6mun
 -----END PGP SIGNATURE-----

Merge tag 'next.2025.05.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Joel Fernandes:
 - Removed swake_up_one_online() workaround
 - Reverted an incorrect rcuog wake-up fix from offline softirq
 - Rust RCU Guard methods marked as inline
 - Updated MAINTAINERS with Joel’s and Zqiang's new email address
 - Replaced magic constant in rcu_seq_done_exact() with named constant
 - Added warning mechanism to validate rcu_seq_done_exact()
 - Switched SRCU polling API to use rcu_seq_done_exact()
 - Commented on redundant delta check in rcu_seq_done_exact()
 - Made ->gpwrap tests in rcutorture more frequent
 - Fixed reuse of ARM64 images in rcutorture
 - rcutorture improved to check Kconfig and reader conflict handling
 - Extracted logic from rcu_torture_one_read() for clarity
 - Updated LWN RCU API documentation links
 - Enabled --do-rt in torture.sh for CONFIG_PREEMPT_RT
 - Added tests for SRCU up/down reader primitives
 - Added comments and delays checks in rcutorture
 - Deprecated srcu_read_lock_lite() and srcu_read_unlock_lite() via checkpatch
 - Added --do-normal and --do-no-normal to torture.sh
 - Added RCU Rust binding tests to torture.sh
 - Reduced CPU overcommit and removed MAXSMP/CPUMASK_OFFSTACK in TREE01
 - Replaced kmalloc() with kcalloc() in rcuscale
 - Refined listRCU example code for stale data elimination
 - Fixed hardirq count bug for x86 in cpu_stall_cputime
 - Added safety checks in rcu/nocb for offloaded rdp access
 - Other miscellaneous changes

* tag 'next.2025.05.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (27 commits)
  rcutorture: Fix issue with re-using old images on ARM64
  rcutorture: Remove MAXSMP and CPUMASK_OFFSTACK from TREE01
  rcutorture: Reduce TREE01 CPU overcommit
  torture: Check for "Call trace:" as well as "Call Trace:"
  rcutorture: Perform more frequent testing of ->gpwrap
  torture: Add testing of RCU's Rust bindings to torture.sh
  torture: Add --do-{,no-}normal to torture.sh
  checkpatch: Deprecate srcu_read_lock_lite() and srcu_read_unlock_lite()
  rcutorture: Comment invocations of tick_dep_set_task()
  rcu/nocb: Add Safe checks for access offloaded rdp
  rcuscale: using kcalloc() to relpace kmalloc()
  doc/RCU/listRCU: refine example code for eliminating stale data
  doc: Update LWN RCU API links in whatisRCU.rst
  Revert "rcu/nocb: Fix rcuog wake-up from offline softirq"
  rust: sync: rcu: Mark Guard methods as inline
  rcu/cpu_stall_cputime: fix the hardirq count for x86 architecture
  rcu: Remove swake_up_one_online() bandaid
  MAINTAINERS: Update Zqiang's email address
  rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT
  srcu: Use rcu_seq_done_exact() for polling API
  ...
2025-05-26 14:20:50 -07:00
Rafael J. Wysocki 76524ffd10 Merge branches 'pm-runtime' and 'pm-sleep'
Merge updates related to system sleep handling and runtime PM for 6.16-rc1:

 - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
   Kalla).

 - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki).

 - Add new devm_ functions for enabling runtime PM and runtime PM
   reference counting (Bence Csókás).

 - Remove size arguments from strscpy() calls in the hibernation core
   code (Thorsten Blum).

 - Adjust the handling of devices with asynchronous suspend enabled
   during system suspend and resume to start resuming them immediately
   after resuming their parents and to start suspending such a device
   immediately after suspending its first child (Rafael Wysocki).

 - Adjust messages printed during tasks freezing to avoid using
   pr_cont() (Andrew Sayers, Paul Menzel).

 - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
   Zhang).

 - Add missing wakeup source attribute relax_count to sysfs and
   remove the space character at the end ofi the string produced by
   pm_show_wakelocks() (Zijun Hu).

 - Add configurable pm_test delay for hibernation (Zihuan Zhang).

 - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
   cypd4226 device on Tegra boards from suspending prematurely (Jon
   Hunter).

 - Unbreak printing PM debug messages during hibernation and clean up
   some related code (Rafael Wysocki).

* pm-runtime:
  PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn()
  PM: sysfs: Move debug runtime PM attributes to runtime_attrs[]
  PM: runtime: Add new devm functions

* pm-sleep:
  PM: freezer: Rewrite restarting tasks log to remove stray *done.*
  PM: sleep: Introduce pm_sleep_transition_in_progress()
  PM: sleep: Introduce pm_suspend_in_progress()
  PM: sleep: Print PM debug messages during hibernation
  ucsi_ccg: Disable async suspend in ucsi_ccg_probe()
  PM: hibernate: add configurable delay for pm_test
  PM: wakeup: Delete space in the end of string shown by pm_show_wakelocks()
  PM: wakeup: Add missing wakeup source attribute relax_count
  PM: sleep: Remove unnecessary !!
  PM: sleep: Use two lines for "Restarting..." / "done" messages
  PM: sleep: Make suspend of devices more asynchronous
  PM: sleep: Suspend async parents after suspending children
  PM: sleep: Resume children after resuming the parent
  PM: hibernate: Remove size arguments when calling strscpy()
2025-05-26 21:21:58 +02:00
Linus Torvalds 6f59de9bc0 for-6.16/block-20250523
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgwnGYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpq9aD/4iqOts77xhWWLrOJWkkhOcV5rREeyppq8X
 MKYul9S4cc4Uin9Xou9a+nab31QBQEk3nsN3kX9o3yAXvkh6yUm36HD8qYNW/46q
 IUkwRQQJ0COyTnexMZQNTbZPQDIYcenXmQxOcrEJ5jC1Jcz0sOKHsgekL+ab3kCy
 fLnuz2ozvjGDMala/NmE8fN5qSlj4qQABHgbamwlwfo4aWu07cwfqn5G/FCYJgDO
 xUvsnTVclom2g4G+7eSSvGQI1QyAxl5QpviPnj/TEgfFBFnhbCSoBTEY6ecqhlfW
 6u59MF/Uw8E+weiuGY4L87kDtBhjQs3UMSLxCuwH7MxXb25ff7qB4AIkcFD0kKFH
 3V5NtwqlU7aQT0xOjGxaHhfPwjLD+FVss4ARmuHS09/Kn8egOW9yROPyetnuH84R
 Oz0Ctnt1IPLFjvGeg3+rt9fjjS9jWOXLITb9Q6nX9gnCt7orCwIYke8YCpmnJyhn
 i+fV4CWYIQBBRKxIT0E/GhJxZOmL0JKpomnbpP2dH8npemnsTCuvtfdrK9gfhH2X
 chBVqCPY8MNU5zKfzdEiavPqcm9392lMzOoOXW2pSC1eAKqnAQ86ZT3r7rLntqE8
 75LxHcvaQIsnpyG+YuJVHvoiJ83TbqZNpyHwNaQTYhDmdYpp2d/wTtTQywX4DuXb
 Y6NDJw5+kQ==
 =1PNK
 -----END PGP SIGNATURE-----

Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - ublk updates:
      - Add support for updating the size of a ublk instance
      - Zero-copy improvements
      - Auto-registering of buffers for zero-copy
      - Series simplifying and improving GET_DATA and request lookup
      - Series adding quiesce support
      - Lots of selftests additions
      - Various cleanups

 - NVMe updates via Christoph:
      - add per-node DMA pools and use them for PRP/SGL allocations
        (Caleb Sander Mateos, Keith Busch)
      - nvme-fcloop refcounting fixes (Daniel Wagner)
      - support delayed removal of the multipath node and optionally
        support the multipath node for private namespaces (Nilay Shroff)
      - support shared CQs in the PCI endpoint target code (Wilfred
        Mallawa)
      - support admin-queue only authentication (Hannes Reinecke)
      - use the crc32c library instead of the crypto API (Eric Biggers)
      - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes
        Reinecke, Leon Romanovsky, Gustavo A. R. Silva)

 - MD updates via Yu:
      - Fix that normal IO can be starved by sync IO, found by mkfs on
        newly created large raid5, with some clean up patches for bdev
        inflight counters

 - Clean up brd, getting rid of atomic kmaps and bvec poking

 - Add loop driver specifically for zoned IO testing

 - Eliminate blk-rq-qos calls with a static key, if not enabled

 - Improve hctx locking for when a plug has IO for multiple queues
   pending

 - Remove block layer bouncing support, which in turn means we can
   remove the per-node bounce stat as well

 - Improve blk-throttle support

 - Improve delay support for blk-throttle

 - Improve brd discard support

 - Unify IO scheduler switching. This should also fix a bunch of lockdep
   warnings we've been seeing, after enabling lockdep support for queue
   freezing/unfreezeing

 - Add support for block write streams via FDP (flexible data placement)
   on NVMe

 - Add a bunch of block helpers, facilitating the removal of a bunch of
   duplicated boilerplate code

 - Remove obsolete BLK_MQ pci and virtio Kconfig options

 - Add atomic/untorn write support to blktrace

 - Various little cleanups and fixes

* tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits)
  selftests: ublk: add test for UBLK_F_QUIESCE
  ublk: add feature UBLK_F_QUIESCE
  selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE
  traceevent/block: Add REQ_ATOMIC flag to block trace events
  ublk: run auto buf unregisgering in same io_ring_ctx with registering
  io_uring: add helper io_uring_cmd_ctx_handle()
  ublk: remove io argument from ublk_auto_buf_reg_fallback()
  ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch()
  selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK
  selftests: ublk: support UBLK_F_AUTO_BUF_REG
  ublk: support UBLK_AUTO_BUF_REG_FALLBACK
  ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG
  ublk: prepare for supporting to register request buffer automatically
  ublk: convert to refcount_t
  selftests: ublk: make IO & device removal test more stressful
  nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk
  nvme: introduce multipath_always_on module param
  nvme-multipath: introduce delayed removal of the multipath head node
  nvme-pci: derive and better document max segments limits
  nvme-pci: use struct_size for allocation struct nvme_dev
  ...
2025-05-26 11:39:36 -07:00
Rafael J. Wysocki f34dc28343 Merge branch 'pm-cpufreq'
Merge cpufreq updates for 6.16-rc1:

 - Refactor cpufreq_online(), add and use cpufreq policy locking guards,
   use __free() in policy reference counting, and clean up core cpufreq
   code on top of that (Rafael Wysocki).

 - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh
   Kumar).

 - Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay
   Ugwekar).

 - Add offline, online and suspend callbacks to the amd-pstate driver,
   rename and use the existing amd_pstate_epp callbacks in it (Dhananjay
   Ugwekar).

 - Add support for the "Requested CPU Min frequency" BIOS option to the
   amd-pstate driver (Dhananjay Ugwekar).

 - Reset amd-pstate driver mode after running selftests (Swapnil
   Sapkal).

 - Add helper for governor checks to the schedutil cpufreq governor and
   move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki).

 - Populate the cpu_capacity sysfs entries from the intel_pstate driver
   after registering asym capacity support (Ricardo Neri).

 - Add support for enabling Energy-aware scheduling (EAS) to the
   intel_pstate driver when operating in the passive mode on a hybrid
   platform (Rafael Wysocki).

 - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan
   Chancellor).

 - Drop redundant cpus_read_lock() from store_local_boost() in the
   cpufreq core (Seyediman Seyedarab).

 - Replace sscanf() with kstrtouint() in the cpufreq code and use a
   symbol instead of a raw number in it (Bowen Yu).

 - Add support for autonomous CPU performance state selection to the
   CPPC cpufreq driver (Lifeng Zheng).

* pm-cpufreq: (31 commits)
  cpufreq: CPPC: Add support for autonomous selection
  cpufreq: Update sscanf() to kstrtouint()
  cpufreq: Replace magic number
  cpufreq: drop redundant cpus_read_lock() from store_local_boost()
  cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver()
  cpufreq: intel_pstate: Document hybrid processor support
  cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache
  cpufreq: intel_pstate: EAS support for hybrid platforms
  cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas()
  cpufreq: intel_pstate: Populate the cpu_capacity sysfs entries
  arch_topology: Relocate cpu_scale to topology.[h|c]
  cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq
  cpufreq/sched: schedutil: Add helper for governor checks
  amd-pstate-ut: Reset amd-pstate driver mode after running selftests
  cpufreq/amd-pstate: Add support for the "Requested CPU Min frequency" BIOS option
  cpufreq/amd-pstate: Add offline, online and suspend callbacks for amd_pstate_driver
  cpufreq: Force sync policy boost with global boost on sysfs update
  cpufreq: Preserve policy's boost state after resume
  cpufreq: Introduce policy_set_boost()
  cpufreq: Don't unnecessarily call set_boost()
  ...
2025-05-26 20:19:40 +02:00
Rafael J. Wysocki e481e10ab5 Merge branch 'pm-em'
Merge energy model management code updates for 6.16-rc1:

 - Fix potential division-by-zero error in em_compute_costs() (Yaxiong
   Tian).

 - Fix typos in energy model documentation and example driver code (Moon
   Hee Lee, Atul Kumar Pant).

 - Rearrange the energy model management code and add a new function for
   adjusting a CPU energy model after adjusting the capacity of the
   given CPU to it (Rafael Wysocki).

* pm-em:
  PM: EM: Introduce em_adjust_cpu_capacity()
  PM: EM: Move CPU capacity check to em_adjust_new_capacity()
  PM: EM: Documentation: Fix typos in example driver code
  PM: EM: Documentation: fix typo in energy-model.rst
  PM: EM: Fix potential division-by-zero error in em_compute_costs()
2025-05-26 20:14:58 +02:00
Linus Torvalds 7d7a103d29 vfs-6.16-rc1.pidfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTwAKCRCRxhvAZXjc
 ov4zAP4yfqKBAz6eMt9CzDgHCdVQJ9Nuur1EiRdot3maPzHTcQEA2hVkJrvVo1Y/
 jCVAf7gmGX1Uu6nCUF6Vjy35g8i20gA=
 =nzYS
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pidfs updates from Christian Brauner:
 "Features:

   - Allow handing out pidfds for reaped tasks for AF_UNIX SO_PEERPIDFD
     socket option

     SO_PEERPIDFD is a socket option that allows to retrieve a pidfd for
     the process that called connect() or listen(). This is heavily used
     to safely authenticate clients in userspace avoiding security bugs
     due to pid recycling races (dbus, polkit, systemd, etc.)

     SO_PEERPIDFD currently doesn't support handing out pidfds if the
     sk->sk_peer_pid thread-group leader has already been reaped. In
     this case it currently returns EINVAL. Userspace still wants to get
     a pidfd for a reaped process to have a stable handle it can pass
     on. This is especially useful now that it is possible to retrieve
     exit information through a pidfd via the PIDFD_GET_INFO ioctl()'s
     PIDFD_INFO_EXIT flag

     Another summary has been provided by David Rheinsberg:

      > A pidfd can outlive the task it refers to, and thus user-space
      > must already be prepared that the task underlying a pidfd is
      > gone at the time they get their hands on the pidfd. For
      > instance, resolving the pidfd to a PID via the fdinfo must be
      > prepared to read `-1`.
      >
      > Despite user-space knowing that a pidfd might be stale, several
      > kernel APIs currently add another layer that checks for this. In
      > particular, SO_PEERPIDFD returns `EINVAL` if the peer-task was
      > already reaped, but returns a stale pidfd if the task is reaped
      > immediately after the respective alive-check.
      >
      > This has the unfortunate effect that user-space now has two ways
      > to check for the exact same scenario: A syscall might return
      > EINVAL/ESRCH/... *or* the pidfd might be stale, even though
      > there is no particular reason to distinguish both cases. This
      > also propagates through user-space APIs, which pass on pidfds.
      > They must be prepared to pass on `-1` *or* the pidfd, because
      > there is no guaranteed way to get a stale pidfd from the kernel.
      >
      > Userspace must already deal with a pidfd referring to a reaped
      > task as the task may exit and get reaped at any time will there
      > are still many pidfds referring to it

     In order to allow handing out reaped pidfd SO_PEERPIDFD needs to
     ensure that PIDFD_INFO_EXIT information is available whenever a
     pidfd for a reaped task is created by PIDFD_INFO_EXIT. The uapi
     promises that reaped pidfds are only handed out if it is guaranteed
     that the caller sees the exit information:

     TEST_F(pidfd_info, success_reaped)
     {
             struct pidfd_info info = {
                     .mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT,
             };

             /*
              * Process has already been reaped and PIDFD_INFO_EXIT been set.
              * Verify that we can retrieve the exit status of the process.
              */
             ASSERT_EQ(ioctl(self->child_pidfd4, PIDFD_GET_INFO, &info), 0);
             ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS));
             ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT));
             ASSERT_TRUE(WIFEXITED(info.exit_code));
             ASSERT_EQ(WEXITSTATUS(info.exit_code), 0);
     }

     To hand out pidfds for reaped processes we thus allocate a pidfs
     entry for the relevant sk->sk_peer_pid at the time the
     sk->sk_peer_pid is stashed and drop it when the socket is
     destroyed. This guarantees that exit information will always be
     recorded for the sk->sk_peer_pid task and we can hand out pidfds
     for reaped processes

   - Hand a pidfd to the coredump usermode helper process

     Give userspace a way to instruct the kernel to install a pidfd for
     the crashing process into the process started as a usermode helper.
     There's still tricky race-windows that cannot be easily or
     sometimes not closed at all by userspace. There's various ways like
     looking at the start time of a process to make sure that the
     usermode helper process is started after the crashing process but
     it's all very very brittle and fraught with peril

     The crashed-but-not-reaped process can be killed by userspace
     before coredump processing programs like systemd-coredump have had
     time to manually open a PIDFD from the PID the kernel provides
     them, which means they can be tricked into reading from an
     arbitrary process, and they run with full privileges as they are
     usermode helper processes

     Even if that specific race-window wouldn't exist it's still the
     safest and cleanest way to let the kernel provide the pidfd
     directly instead of requiring userspace to do it manually. In
     parallel with this commit we already have systemd adding support
     for this in [1]

     When the usermode helper process is forked we install a pidfd file
     descriptor three into the usermode helper's file descriptor table
     so it's available to the exec'd program

     Since usermode helpers are either children of the system_unbound_wq
     workqueue or kthreadd we know that the file descriptor table is
     empty and can thus always use three as the file descriptor number

     Note, that we'll install a pidfd for the thread-group leader even
     if a subthread is calling do_coredump(). We know that task linkage
     hasn't been removed yet and even if this @current isn't the actual
     thread-group leader we know that the thread-group leader cannot be
     reaped until
     @current has exited

   - Allow telling when a task has not been found from finding the wrong
     task when creating a pidfd

     We currently report EINVAL whenever a struct pid has no tasked
     attached anymore thereby conflating two concepts:

      (1) The task has already been reaped

      (2) The caller requested a pidfd for a thread-group leader but the
          pid actually references a struct pid that isn't used as a
          thread-group leader

     This is causing issues for non-threaded workloads as in where they
     expect ESRCH to be reported, not EINVAL

     So allow userspace to reliably distinguish between (1) and (2)

   - Make it possible to detect when a pidfs entry would outlive the
     struct pid it pinned

   - Add a range of new selftests

  Cleanups:

   - Remove unneeded NULL check from pidfd_prepare() for passed struct
     pid

   - Avoid pointless reference count bump during release_task()

  Fixes:

   - Various fixes to the pidfd and coredump selftests

   - Fix error handling for replace_fd() when spawning coredump usermode
     helper"

* tag 'vfs-6.16-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  pidfs: detect refcount bugs
  coredump: hand a pidfd to the usermode coredump helper
  coredump: fix error handling for replace_fd()
  pidfs: move O_RDWR into pidfs_alloc_file()
  selftests: coredump: Raise timeout to 2 minutes
  selftests: coredump: Fix test failure for slow machines
  selftests: coredump: Properly initialize pointer
  net, pidfs: enable handing out pidfds for reaped sk->sk_peer_pid
  pidfs: get rid of __pidfd_prepare()
  net, pidfs: prepare for handing out pidfds for reaped sk->sk_peer_pid
  pidfs: register pid in pidfs
  net, pidfd: report EINVAL for ESRCH
  release_task: kill the no longer needed get/put_pid(thread_pid)
  pidfs: ensure consistent ENOENT/ESRCH reporting
  exit: move wake_up_all() pidfd waiters into __unhash_process()
  selftest/pidfd: add test for thread-group leader pidfd open for thread
  pidfd: improve uapi when task isn't found
  pidfd: remove unneeded NULL check from pidfd_prepare()
  selftests/pidfd: adapt to recent changes
2025-05-26 10:30:02 -07:00
Linus Torvalds 8dd53535f1 vfs-6.16-rc1.super
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTwAKCRCRxhvAZXjc
 oi3BAQD/IBxTbAZIe7vEAsuLlBoKbWrzPGvxzd4UeMGo6OY18wEAvvyJM+arQy51
 jS0ZErDOJnPNe7jps+Gh+WDx6d3NMAY=
 =lqAG
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs freezing updates from Christian Brauner:
 "This contains various filesystem freezing related work for this cycle:

   - Allow the power subsystem to support filesystem freeze for suspend
     and hibernate.

     Now all the pieces are in place to actually allow the power
     subsystem to freeze/thaw filesystems during suspend/resume.
     Filesystems are only frozen and thawed if the power subsystem does
     actually own the freeze.

     If the filesystem is already frozen by the time we've frozen all
     userspace processes we don't care to freeze it again. That's
     userspace's job once the process resumes. We only actually freeze
     filesystems if we absolutely have to and we ignore other failures
     to freeze.

     We could bubble up errors and fail suspend/resume if the error
     isn't EBUSY (aka it's already frozen) but I don't think that this
     is worth it. Filesystem freezing during suspend/resume is
     best-effort. If the user has 500 ext4 filesystems mounted and 4
     fail to freeze for whatever reason then we simply skip them.

     What we have now is already a big improvement and let's see how we
     fare with it before making our lives even harder (and uglier) than
     we have to.

   - Allow efivars to support freeze and thaw

     Allow efivarfs to partake to resync variable state during system
     hibernation and suspend. Add freeze/thaw support.

     This is a pretty straightforward implementation. We simply add
     regular freeze/thaw support for both userspace and the kernel.
     efivars is the first pseudofilesystem that adds support for
     filesystem freezing and thawing.

     The simplicity comes from the fact that we simply always resync
     variable state after efivarfs has been frozen. It doesn't matter
     whether that's because of suspend, userspace initiated freeze or
     hibernation. Efivars is simple enough that it doesn't matter that
     we walk all dentries. There are no directories and there aren't
     insane amounts of entries and both freeze/thaw are already
     heavy-handed operations. If userspace initiated a freeze/thaw cycle
     they would need CAP_SYS_ADMIN in the initial user namespace (as
     that's where efivarfs is mounted) so it can't be triggered by
     random userspace. IOW, we really really don't care"

* tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  f2fs: fix freezing filesystem during resize
  kernfs: add warning about implementing freeze/thaw
  efivarfs: support freeze/thaw
  power: freeze filesystems during suspend/resume
  libfs: export find_next_child()
  super: add filesystem freezing helpers for suspend and hibernate
  gfs2: pass through holder from the VFS for freeze/thaw
  super: use common iterator (Part 2)
  super: use a common iterator (Part 1)
  super: skip dying superblocks early
  super: simplify user_get_super()
  super: remove pointless s_root checks
  fs: allow all writers to be frozen
  locking/percpu-rwsem: add freezable alternative to down_read
2025-05-26 09:33:44 -07:00
Linus Torvalds 181d8e399f vfs-6.16-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTwAKCRCRxhvAZXjc
 om0+AQDMxKLweJXplqQQ7jxuvW2dEa60YpE2EalEKWGg9YA3KgEA3nI4kyKMKn7Y
 PRFXgIcKvhs62oJLKsq8SGQUqExqvAE=
 =atEw
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Use folios for symlinks in the page cache

     FUSE already uses folios for its symlinks. Mirror that conversion
     in the generic code and the NFS code. That lets us get rid of a few
     folio->page->folio conversions in this path, and some of the few
     remaining users of read_cache_page() / read_mapping_page()

   - Try and make a few filesystem operations killable on the VFS
     inode->i_mutex level

   - Add sysctl vfs_cache_pressure_denom for bulk file operations

     Some workloads need to preserve more dentries than we currently
     allow through out sysctl interface

     A HDFS servers with 12 HDDs per server, on a HDFS datanode startup
     involves scanning all files and caching their metadata (including
     dentries and inodes) in memory. Each HDD contains approximately 2
     million files, resulting in a total of ~20 million cached dentries
     after initialization

     To minimize dentry reclamation, they set vfs_cache_pressure to 1.
     Despite this configuration, memory pressure conditions can still
     trigger reclamation of up to 50% of cached dentries, reducing the
     cache from 20 million to approximately 10 million entries. During
     the subsequent cache rebuild period, any HDFS datanode restart
     operation incurs substantial latency penalties until full cache
     recovery completes

     To maintain service stability, more dentries need to be preserved
     during memory reclamation. The current minimum reclaim ratio (1/100
     of total dentries) remains too aggressive for such workload. This
     patch introduces vfs_cache_pressure_denom for more granular cache
     pressure control

     The configuration [vfs_cache_pressure=1,
     vfs_cache_pressure_denom=10000] effectively maintains the full 20
     million dentry cache under memory pressure, preventing datanode
     restart performance degradation

   - Avoid some jumps in inode_permission() using likely()/unlikely()

   - Avid a memory access which is most likely a cache miss when
     descending into devcgroup_inode_permission()

   - Add fastpath predicts for stat() and fdput()

   - Anonymous inodes currently don't come with a proper mode causing
     issues in the kernel when we want to add useful VFS debug assert.
     Fix that by giving them a proper mode and masking it off when we
     report it to userspace which relies on them not having any mode

   - Anonymous inodes currently allow to change inode attributes because
     the VFS falls back to simple_setattr() if i_op->setattr isn't
     implemented. This means the ownership and mode for every single
     user of anon_inode_inode can be changed. Block that as it's either
     useless or actively harmful. If specific ownership is needed the
     respective subsystem should allocate anonymous inodes from their
     own private superblock

   - Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock

   - Add proper tests for anonymous inode behavior

   - Make it easy to detect proper anonymous inodes and to ensure that
     we can detect them in codepaths such as readahead()

  Cleanups:

   - Port pidfs to the new anon_inode_{g,s}etattr() helpers

   - Try to remove the uselib() system call

   - Add unlikely branch hint return path for poll

   - Add unlikely branch hint on return path for core_sys_select

   - Don't allow signals to interrupt getdents copying for fuse

   - Provide a size hint to dir_context for during readdir()

   - Use writeback_iter directly in mpage_writepages

   - Update compression and mtime descriptions in initramfs
     documentation

   - Update main netfs API document

   - Remove useless plus one in super_cache_scan()

   - Remove unnecessary NULL-check guards during setns()

   - Add separate separate {get,put}_cgroup_ns no-op cases

  Fixes:

   - Fix typo in root= kernel parameter description

   - Use KERN_INFO for infof()|info_plog()|infofc()

   - Correct comments of fs_validate_description()

   - Mark an unlikely if condition with unlikely() in
     vfs_parse_monolithic_sep()

   - Delete macro fsparam_u32hex()

   - Remove unused and problematic validate_constant_table()

   - Fix potential unsigned integer underflow in fs_name()

   - Make file-nr output the total allocated file handles"

* tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits)
  fs: Pass a folio to page_put_link()
  nfs: Use a folio in nfs_get_link()
  fs: Convert __page_get_link() to use a folio
  fs/read_write: make default_llseek() killable
  fs/open: make do_truncate() killable
  fs/open: make chmod_common() and chown_common() killable
  include/linux/fs.h: add inode_lock_killable()
  readdir: supply dir_context.count as readdir buffer size hint
  vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
  fuse: don't allow signals to interrupt getdents copying
  Documentation: fix typo in root= kernel parameter description
  include/cgroup: separate {get,put}_cgroup_ns no-op case
  kernel/nsproxy: remove unnecessary guards
  fs: use writeback_iter directly in mpage_writepages
  fs: remove useless plus one in super_cache_scan()
  fs: add S_ANON_INODE
  fs: remove uselib() system call
  device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()
  fs/fs_parse: Remove unused and problematic validate_constant_table()
  fs: touch up predicts in inode_permission()
  ...
2025-05-26 09:02:39 -07:00
Linus Torvalds 6d5b940e1e vfs-6.16-rc1.async.dir
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBN6wAKCRCRxhvAZXjc
 ok32AQD9DTiSCAoVg+7s+gSBuLTi8drPTN++mCaxdTqRh5WpRAD9GVyrGQT0s6LH
 eo9bm8d1TAYjilEWM0c0K0TxyQ7KcAA=
 =IW7H
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs directory lookup updates from Christian Brauner:
 "This contains cleanups for the lookup_one*() family of helpers.

  We expose a set of functions with names containing "lookup_one_len"
  and others without the "_len". This difference has nothing to do with
  "len". It's rater a historical accident that can be confusing.

  The functions without "_len" take a "mnt_idmap" pointer. This is found
  in the "vfsmount" and that is an important question when choosing
  which to use: do you have a vfsmount, or are you "inside" the
  filesystem. A related question is "is permission checking relevant
  here?".

  nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len
  functions. They pass nop_mnt_idmap and refuse to work on filesystems
  which have any other idmap.

  This work changes nfsd and cachefile to use the lookup_one family of
  functions and to explictily pass &nop_mnt_idmap which is consistent
  with all other vfs interfaces used where &nop_mnt_idmap is explicitly
  passed.

  The remaining uses of the "_one" functions do not require permission
  checks so these are renamed to be "_noperm" and the permission
  checking is removed.

  This series also changes these lookup function to take a qstr instead
  of separate name and len. In many cases this simplifies the call"

* tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  VFS: change lookup_one_common and lookup_noperm_common to take a qstr
  Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS
  VFS: rename lookup_one_len family to lookup_noperm and remove permission check
  cachefiles: Use lookup_one() rather than lookup_one_len()
  nfsd: Use lookup_one() rather than lookup_one_len()
  VFS: improve interface for lookup_one functions
2025-05-26 08:02:43 -07:00
Linus Torvalds 0f8c0258bf 22 hotfixes. 13 are cc:stable and the remainder address post-6.14 issues
or aren't considered necessary for -stable kernels.  19 are for MM.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDLNqwAKCRDdBJ7gKXxA
 juanAQD4aZn7ACTpbIgDIlLVJouq6OOHEYye9hhxz19UN2mAUgEAn8jPqvBDav3S
 HxjMFSdgLUQVO03FCs9tpNJchi69nw0=
 =R3UI
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-05-25-00-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "22 hotfixes.

  13 are cc:stable and the remainder address post-6.14 issues or aren't
  considered necessary for -stable kernels. 19 are for MM"

* tag 'mm-hotfixes-stable-2025-05-25-00-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
  mailmap: add Jarkko's employer email address
  mm: fix copy_vma() error handling for hugetlb mappings
  memcg: always call cond_resched() after fn()
  mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios
  mm: vmalloc: only zero-init on vrealloc shrink
  mm: vmalloc: actually use the in-place vrealloc region
  alloc_tag: allocate percpu counters for module tags dynamically
  module: release codetag section when module load fails
  mm/cma: make detection of highmem_start more robust
  MAINTAINERS: add mm memory policy section
  MAINTAINERS: add mm ksm section
  kasan: avoid sleepable page allocation from atomic context
  highmem: add folio_test_partial_kmap()
  MAINTAINERS: add hung-task detector section
  taskstats: fix struct taskstats breaks backward compatibility since version 15
  mm/truncate: fix out-of-bounds when doing a right-aligned split
  MAINTAINERS: add mm reclaim section
  MAINTAINERS: update page allocator section
  mm: fix VM_UFFD_MINOR == VM_SHADOW_STACK on USERFAULTFD=y && ARM64_GCS=y
  mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled
  ...
2025-05-25 07:48:35 -07:00
Ingo Molnar 94ec70880f Merge branch 'locking/futex' into locking/core, to pick up pending futex changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-25 10:10:08 +02:00
David Wang 221fcbf775 module: release codetag section when module load fails
When module load fails after memory for codetag section is ready, codetag
section memory will not be properly released.  This causes memory leak,
and if next module load happens to get the same module address, codetag
may pick the uninitialized section when manipulating tags during module
unload, and leads to "unable to handle page fault" BUG.

Link: https://lkml.kernel.org/r/20250519163823.7540-1-00107082@163.com
Fixes: 0db6f8d782 ("alloc_tag: load module tags into separate contiguous memory")
Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/
Signed-off-by: David Wang <00107082@163.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25 00:53:47 -07:00
Mykyta Yatsenko 079e5c56a5 bpf: Fix error return value in bpf_copy_from_user_dynptr
On error, copy_from_user returns number of bytes not copied to
destination, but current implementation of copy_user_data_sleepable does
not handle that correctly and returns it as error value, which may
confuse user, expecting meaningful negative error value.

Fixes: a498ee7576 ("bpf: Implement dynptr copy kfuncs")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250523181705.261585-1-mykyta.yatsenko5@gmail.com
2025-05-23 13:25:02 -07:00
Lorenz Bauer a539e2a6d5 btf: Allow mmap of vmlinux btf
User space needs access to kernel BTF for many modern features of BPF.
Right now each process needs to read the BTF blob either in pieces or
as a whole. Allow mmaping the sysfs file so that processes can directly
access the memory allocated for it in the kernel.

remap_pfn_range is used instead of vm_insert_page due to aarch64
compatibility issues.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://lore.kernel.org/bpf/20250520-vmlinux-mmap-v5-1-e8c941acc414@isovalent.com
2025-05-23 10:06:28 -07:00
Ritesh Harjani (IBM) 927244f6ef traceevent/block: Add REQ_ATOMIC flag to block trace events
Filesystems like XFS can implement atomic write I/O using either
REQ_ATOMIC flag set in the bio or via CoW operation. It will be useful
if we have a flag in trace events to distinguish between the two. This
patch adds char 'U' (Untorn writes) to rwbs field of the trace events
if REQ_ATOMIC flag is set in the bio.

<W/ REQ_ATOMIC>
=================
xfs_io-4238    [009] .....  4148.126843: block_rq_issue: 259,0 WFSU 16384 () 768 + 32 none,0,0 [xfs_io]
<idle>-0       [009] d.h1.  4148.129864: block_rq_complete: 259,0 WFSU () 768 + 32 none,0,0 [0]

<W/O REQ_ATOMIC>
===============
xfs_io-4237    [010] .....  4143.325616: block_rq_issue: 259,0 WS 16384 () 768 + 32 none,0,0 [xfs_io]
<idle>-0       [010] d.H1.  4143.329138: block_rq_complete: 259,0 WS () 768 + 32 none,0,0 [0]

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/44317cb2ec4588f6a2c1501a96684e6a1196e8ba.1747921498.git.ritesh.list@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23 09:18:48 -06:00
Roger Pau Monne d24c6b78ac xen: enable XEN_UNPOPULATED_ALLOC as part of xen.config
PVH dom0 is useless without XEN_UNPOPULATED_ALLOC, as otherwise it will
very likely balloon out all dom0 memory to map foreign and grant pages.

Enable it by default as part of xen.config.  This also requires enabling
MEMORY_HOTREMOVE and ZONE_DEVICE.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250514092037.28970-1-roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2025-05-23 07:07:09 +02:00
Tejun Heo 273cc94965 sched_ext: Call ops.update_idle() after updating builtin idle bits
BPF schedulers that use both builtin CPU idle mechanism and
ops.update_idle() may want to use the latter to create interlocking between
ops.enqueue() and CPU idle transitions so that either ops.enqueue() sees the
idle bit or ops.update_idle() sees the task queued somewhere. This can
prevent race conditions where CPUs go idle while tasks are waiting in DSQs.

For such interlocking to work, ops.update_idle() must be called after
builtin CPU masks are updated. Relocate the invocation. Currently, there are
no ordering requirements on transitions from idle and this relocation isn't
expected to make meaningful differences in that direction.

This also makes the ops.update_idle() behavior semantically consistent:
any action performed in this callback should be able to override the
builtin idle state, not the other way around.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-and-tested-by: Andrea Righi <arighi@nvidia.com>
Acked-by: Changwoo Min <changwoo@igalia.com>
2025-05-22 09:25:15 -10:00
Tejun Heo 82648b8b2a sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier
Replace explicit cgroup_bpf_inherit/offline() calls from cgroup
creation/destruction paths with notification callback registered on
cgroup_lifetime_notifier.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-22 09:20:19 -10:00
Tejun Heo 9e8c67a9e5 sched_ext: Introduce cgroup_lifetime_notifier
Other subsystems may make use of the cgroup hierarchy with the cgroup_bpf
support being one such example. For such a feature, it's useful to be able
to hook into cgroup creation and destruction paths to perform
feature-specific initializations and cleanups.

Add cgroup_lifetime_notifier which generates CGROUP_LIFETIME_ONLINE and
CGROUP_LIFETIME_OFFLINE events whenever cgroups are created and destroyed,
respectively.

The next patch will convert cgroup_bpf to use the new notifier and other
uses are planned.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-22 09:20:11 -10:00
Tejun Heo cd22cbad1b cgroup: Minor reorganization of cgroup_create()
cgroup_bpf init and exit handling will be moved to a notifier chain. In
prepartion, reorganize cgroup_create() a bit so that the new cgroup is fully
initialized before any outside changes are made.

- cgrp->ancestors[] initialization and the hierarchical nr_descendants and
  nr_frozen_descendants updates were in the same loop. Separate them out and
  do the former earlier and do the latter later.

- Relocate cgroup_bpf_inherit() call so that it's after all cgroup
  initializations are complete.

No visible behavior changes expected.

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-22 09:19:57 -10:00
Jakub Kicinski 33e1b1b399 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc8).

Conflicts:
  80f2ab46c2 ("irdma: free iwdev->rf after removing MSI-X")
  4bcc063939 ("ice, irdma: fix an off by one in error handling code")
  c24a65b6a2 ("iidc/ice/irdma: Update IDC to support multiple consumers")
https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au

No extra adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22 09:42:41 -07:00