Commit Graph

82655 Commits (master)

Author SHA1 Message Date
Linus Torvalds 4d38b88fd1 printk changes for 6.19
-----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmktlbUbFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyevsP/1z98/wfCaSCquIq4H8S
 OTqFGybGgYQt1NmMj2cGPpbAE3LJNYORT0A4tcoqOTy1Z5xbQz63rO3clSI/e7Mf
 n4ZZ7NvkE40i8et1BjqtZa9dSkAv4QLYH73KrtNeuTr5tqvHo1x8FakUH6gQnb1k
 QOOebvbVXnOb+rh89j1GZShrLFcCil0psjp165WHAYE/3PyFBgYGLMCgwLqS+W3H
 re5Q4sl/ySXpMFF/XN1Kww48FWxy/h+YQFCxZwuWlUcXtVjqZ+BN+keb7AqaFQ7R
 dC2exV2W0RBoupEJR/FWHoXrm/bDDLhzqRaMvoggLJrMJ9L6V0WdIhaFA4qzoG63
 paJGFjUfmDX3dpPsAddq7kKeevCz4a2/HwFKhiBqqq4tdHuely7wZgnoFO7ovgmu
 DYDCXHtpJuWZR3WJ5I/V/sJ9i9KFXhhyWcKVf13QTAFiCaA09aeSAcUWNYNaaxbn
 nu6IkUxdIVnWIEBgcYH6jz1DrPGreYLYuD4bVb2gdZoP0r3tnMpG6xfSNIUueSGd
 VFAKW9PJYaj7Id+jgACH6V+gQ22L600xJDdL1bPjRbGE0LD7vlz2F1MZTq3BFJFn
 hUxJeOZplHX+TPophdvH4MO9VLmydWLUyJiDBP1yA8M9XZms/5s7IJJ1RYXqUCcf
 qEB4L7W1+Qy1R/lzf2PU9X4R
 =FnfO
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Allow creaing nbcon console drivers with an unsafe write_atomic()
   callback that can only be called by the final nbcon_atomic_flush_unsafe().
   Otherwise, the driver would rely on the kthread.

   It is going to be used as the-best-effort approach for an
   experimental nbcon netconsole driver, see

     https://lore.kernel.org/r/20251121-nbcon-v1-2-503d17b2b4af@debian.org

   Note that a safe .write_atomic() callback is supposed to work in NMI
   context. But some networking drivers are not safe even in IRQ
   context:

     https://lore.kernel.org/r/oc46gdpmmlly5o44obvmoatfqo5bhpgv7pabpvb6sjuqioymcg@gjsma3ghoz35

   In an ideal world, all networking drivers would be fixed first and
   the atomic flush would be blocked only in NMI context. But it brings
   the question how reliable networking drivers are when the system is
   in a bad state. They might block flushing more reliable serial
   consoles which are more suitable for serious debugging anyway.

 - Allow to use the last 4 bytes of the printk ring buffer.

 - Prevent queuing IRQ work and block printk kthreads when consoles are
   suspended. Otherwise, they create non-necessary churn or even block
   the suspend.

 - Release console_lock() between each record in the kthread used for
   legacy consoles on RT. It might significantly speed up the boot.

 - Release nbcon context between each record in the atomic flush. It
   prevents stalls of the related printk kthread after it has lost the
   ownership in the middle of a record

 - Add support for NBCON consoles into KDB

 - Add %ptsP modifier for printing struct timespec64 and use it where
   possible

 - Misc code clean up

* tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (48 commits)
  printk: Use console_is_usable on console_unblank
  arch: um: kmsg_dump: Use console_is_usable
  drivers: serial: kgdboc: Drop checks for CON_ENABLED and CON_BOOT
  lib/vsprintf: Unify FORMAT_STATE_NUM handlers
  printk: Avoid irq_work for printk_deferred() on suspend
  printk: Avoid scheduling irq_work on suspend
  printk: Allow printk_trigger_flush() to flush all types
  tracing: Switch to use %ptSp
  scsi: snic: Switch to use %ptSp
  scsi: fnic: Switch to use %ptSp
  s390/dasd: Switch to use %ptSp
  ptp: ocp: Switch to use %ptSp
  pps: Switch to use %ptSp
  PCI: epf-test: Switch to use %ptSp
  net: dsa: sja1105: Switch to use %ptSp
  mmc: mmc_test: Switch to use %ptSp
  media: av7110: Switch to use %ptSp
  ipmi: Switch to use %ptSp
  igb: Switch to use %ptSp
  e1000e: Switch to use %ptSp
  ...
2025-12-03 12:42:36 -08:00
Jakub Kicinski 4de4454299 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes in preparation for the net-next PR.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-02 15:37:53 -08:00
Pavel Begunkov 9954464d73 net: page_pool: sanitise allocation order
We're going to give more control over rx buffer sizes to user space, and
since we can't always rely on driver validation, let's sanitise it in
page_pool_init() as well. Note that we only need to reject over
MAX_PAGE_ORDER allocations for normal page pools, as current memory
providers don't need to use the buddy allocator and must check the order
on init.i

Suggested-by: Stanislav Fomichev <stfomichev@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/77ad83c1aec66cbd00e7b3952f74bc3b7a988150.1764542851.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-02 11:08:39 -08:00
Pavel Begunkov 854858848b net: page pool: xa init with destroy on pp init
The free_ptr_ring label path initialises ->dma_mapped xarray but doesn't
destroy it in case of an error. That's not a real problem since init
itself doesn't do anything requiring destruction, but still match it
with xa_destroy() to silence warnings.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/02904c6d83dbe5cc1c671106a5c97bd93ab31006.1764542851.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-02 11:08:39 -08:00
Linus Torvalds 1dce50698a Scoped user mode access and related changes:
- Implement the missing u64 user access function on ARM when
    CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in
    generic code with [unsafe_]get_user(). All other architectures and ARM
    variants provide the relevant accessors already.
 
  - Ensure that ASM GOTO jump label usage in the user mode access helpers
    always goes through a local C scope label indirection inside the
    helpers. This is required because compilers are not supporting that a
    ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit
    the cleanup invocation and CLANG fails the build.
 
    This provides generic wrapper macros and the conversion of affected
    architecture code to use them.
 
  - Scoped user mode access with auto cleanup
 
    Access to user mode memory can be required in hot code paths, but if it
    has to be done with user controlled pointers, the access is shielded
    with a speculation barrier, so that the CPU cannot speculate around the
    address range check. Those speculation barriers impact performance quite
    significantly. This can be avoided by "masking" the provided pointer so
    it is guaranteed to be in the valid user memory access range and
    otherwise to point to a guaranteed unpopulated address space. This has
    to be done without branches so it creates an address dependency for the
    access, which the CPU cannot speculate ahead.
 
    This results in repeating and error prone programming patterns:
 
      	    if (can_do_masked_user_access())
                     from = masked_user_read_access_begin((from));
             else if (!user_read_access_begin(from, sizeof(*from)))
                     return -EFAULT;
             unsafe_get_user(val, from, Efault);
             user_read_access_end();
             return 0;
       Efault:
             user_read_access_end();
             return -EFAULT;
 
     which can be replaced with scopes and automatic cleanup:
 
             scoped_user_read_access(from, Efault)
                     unsafe_get_user(val, from, Efault);
             return 0;
        Efault:
             return -EFAULT;
 
  - Convert code which implements the above pattern over to
    scope_user.*.access(). This also corrects a couple of imbalanced
    masked_*_begin() instances which are harmless on most architectures, but
    prevent PowerPC from implementing the masking optimization.
 
  - Add a missing speculation barrier in copy_from_user_iter()
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksRfITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVhBEACEySjWcyCrD1e0ZFMFAOJZFI2BShav
 reotzCzmHYQdpVukDRxc64BgM2vN4yB04xnyMhi2o4hSTiIJhz1NzbKggsQJhVoA
 psYz+xEI161HuLZnUBUBuF9RRko/HVsbGqO2JFCuOKor4GCycvjVgupR3EIN9h5T
 HZEWGIgaTmN7MBj0QRrJgJkaaSTnPKOwWaNMV/F9pfk27zuB7vuV8WM9P3FaJYG+
 JGa9td7VGaBpWavxgMJqfdvXWBCVDDfZ1dunWx8tPTnLxKZZZD6HlfQXhZTr2n1e
 rtJpGgfVBx5Uqxn4RrhS0I7QeK1b9rrt3IU7EkFoaa3Z8LU5B7cHlm7KyicyoHhy
 SzFFUszssznT/0OhA5fmgPRlqI295HynW2p1L4Xy9hC0EZ2vXJPG5rO6X3x6QwSR
 asjRB7x/6JzWQUzE7/nhXd9KcB66wvQxhnjp7GqulF74aPBCtIdXXDD68YEDYkbi
 dPC3NRBr0ePbsGVGWbYvYIPWcvo1u814C2io1zKwmVbiN6lCYURgQK861vfAZUP8
 oP5D2a6ENgezDKoJo6eJ82inuDu64qZy7OOkU/aO3cbOuWGVyY9CjYD11x85Nr0k
 UNabSOfvcmhmobtYUiAgLLrjX1grQUG3F74ZQTw513mwgMObuDAAoS11GPjY6HL6
 b99WUJRv8jP66A==
 =6no0
 -----END PGP SIGNATURE-----

Merge tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scoped user access updates from Thomas Gleixner:
 "Scoped user mode access and related changes:

   - Implement the missing u64 user access function on ARM when
     CONFIG_CPU_SPECTRE=n.

     This makes it possible to access a 64bit value in generic code with
     [unsafe_]get_user(). All other architectures and ARM variants
     provide the relevant accessors already.

   - Ensure that ASM GOTO jump label usage in the user mode access
     helpers always goes through a local C scope label indirection
     inside the helpers.

     This is required because compilers are not supporting that a ASM
     GOTO target leaves a auto cleanup scope. GCC silently fails to emit
     the cleanup invocation and CLANG fails the build.

     [ Editor's note: gcc-16 will have fixed the code generation issue
       in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm
       goto jumps [PR122835]"). But we obviously have to deal with clang
       and older versions of gcc, so.. - Linus ]

     This provides generic wrapper macros and the conversion of affected
     architecture code to use them.

   - Scoped user mode access with auto cleanup

     Access to user mode memory can be required in hot code paths, but
     if it has to be done with user controlled pointers, the access is
     shielded with a speculation barrier, so that the CPU cannot
     speculate around the address range check. Those speculation
     barriers impact performance quite significantly.

     This cost can be avoided by "masking" the provided pointer so it is
     guaranteed to be in the valid user memory access range and
     otherwise to point to a guaranteed unpopulated address space. This
     has to be done without branches so it creates an address dependency
     for the access, which the CPU cannot speculate ahead.

     This results in repeating and error prone programming patterns:

       	    if (can_do_masked_user_access())
                      from = masked_user_read_access_begin((from));
              else if (!user_read_access_begin(from, sizeof(*from)))
                      return -EFAULT;
              unsafe_get_user(val, from, Efault);
              user_read_access_end();
              return 0;
        Efault:
              user_read_access_end();
              return -EFAULT;

      which can be replaced with scopes and automatic cleanup:

              scoped_user_read_access(from, Efault)
                      unsafe_get_user(val, from, Efault);
              return 0;
         Efault:
              return -EFAULT;

   - Convert code which implements the above pattern over to
     scope_user.*.access(). This also corrects a couple of imbalanced
     masked_*_begin() instances which are harmless on most
     architectures, but prevent PowerPC from implementing the masking
     optimization.

   - Add a missing speculation barrier in copy_from_user_iter()"

* tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required
  scm: Convert put_cmsg() to scoped user access
  iov_iter: Add missing speculation barrier to copy_from_user_iter()
  iov_iter: Convert copy_from_user_iter() to masked user access
  select: Convert to scoped user access
  x86/futex: Convert to scoped user access
  futex: Convert to get/put_user_inline()
  uaccess: Provide put/get_user_inline()
  uaccess: Provide scoped user access regions
  arm64: uaccess: Use unsafe wrappers for ASM GOTO
  s390/uaccess: Use unsafe wrappers for ASM GOTO
  riscv/uaccess: Use unsafe wrappers for ASM GOTO
  powerpc/uaccess: Use unsafe wrappers for ASM GOTO
  x86/uaccess: Use unsafe wrappers for ASM GOTO
  uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user()
  ARM: uaccess: Implement missing __get_user_asm_dword()
2025-12-02 08:01:39 -08:00
Xiang Mei 9fefc78f7f net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
In cake_drop(), qdisc_tree_reduce_backlog() is used to update the qlen
and backlog of the qdisc hierarchy. Its caller, cake_enqueue(), assumes
that the parent qdisc will enqueue the current packet. However, this
assumption breaks when cake_enqueue() returns NET_XMIT_CN: the parent
qdisc stops enqueuing current packet, leaving the tree qlen/backlog
accounting inconsistent. This mismatch can lead to a NULL dereference
(e.g., when the parent Qdisc is qfq_qdisc).

This patch computes the qlen/backlog delta in a more robust way by
observing the difference before and after the series of cake_drop()
calls, and then compensates the qdisc tree accounting if cake_enqueue()
returns NET_XMIT_CN.

To ensure correct compensation when ACK thinning is enabled, a new
variable is introduced to keep qlen unchanged.

Fixes: 15de71d06a ("net/sched: Make cake_enqueue return NET_XMIT_CN when past buffer_limit")
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20251128001415.377823-1-xmei5@asu.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-02 13:28:00 +01:00
Linus Torvalds 1b5dd29869 vfs-6.19-rc1.fd_prepare.fs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZwAKCRCRxhvAZXjc
 op0AAP4oNVJkFyvgKoPos5K2EGFB8M7merGhpYtsOoeg8UK6OwD/UySQErHsXQDR
 sUDDa5uFOhfrkcfM8REtAN4wF8p5qAc=
 =QgFD
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.fd_prepare.fs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull fd prepare updates from Christian Brauner:
 "This adds the FD_ADD() and FD_PREPARE() primitive. They simplify the
  common pattern of get_unused_fd_flags() + create file + fd_install()
  that is used extensively throughout the kernel and currently requires
  cumbersome cleanup paths.

  FD_ADD() - For simple cases where a file is installed immediately:

      fd = FD_ADD(O_CLOEXEC, vfio_device_open_file(device));
      if (fd < 0)
          vfio_device_put_registration(device);
      return fd;

  FD_PREPARE() - For cases requiring access to the fd or file, or
  additional work before publishing:

      FD_PREPARE(fdf, O_CLOEXEC, sync_file->file);
      if (fdf.err) {
          fput(sync_file->file);
          return fdf.err;
      }

      data.fence = fd_prepare_fd(fdf);
      if (copy_to_user((void __user *)arg, &data, sizeof(data)))
          return -EFAULT;

      return fd_publish(fdf);

  The primitives are centered around struct fd_prepare. FD_PREPARE()
  encapsulates all allocation and cleanup logic and must be followed by
  a call to fd_publish() which associates the fd with the file and
  installs it into the caller's fdtable. If fd_publish() isn't called,
  both are deallocated automatically. FD_ADD() is a shorthand that does
  fd_publish() immediately and never exposes the struct to the caller.

  I've implemented this in a way that it's compatible with the cleanup
  infrastructure while also being usable separately. IOW, it's centered
  around struct fd_prepare which is aliased to class_fd_prepare_t and so
  we can make use of all the basica guard infrastructure"

* tag 'vfs-6.19-rc1.fd_prepare.fs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits)
  io_uring: convert io_create_mock_file() to FD_PREPARE()
  file: convert replace_fd() to FD_PREPARE()
  vfio: convert vfio_group_ioctl_get_device_fd() to FD_ADD()
  tty: convert ptm_open_peer() to FD_ADD()
  ntsync: convert ntsync_obj_get_fd() to FD_PREPARE()
  media: convert media_request_alloc() to FD_PREPARE()
  hv: convert mshv_ioctl_create_partition() to FD_ADD()
  gpio: convert linehandle_create() to FD_PREPARE()
  pseries: port papr_rtas_setup_file_interface() to FD_ADD()
  pseries: convert papr_platform_dump_create_handle() to FD_ADD()
  spufs: convert spufs_gang_open() to FD_PREPARE()
  papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()
  spufs: convert spufs_context_open() to FD_PREPARE()
  net/socket: convert __sys_accept4_file() to FD_ADD()
  net/socket: convert sock_map_fd() to FD_ADD()
  net/kcm: convert kcm_ioctl() to FD_PREPARE()
  net/handshake: convert handshake_nl_accept_doit() to FD_PREPARE()
  secretmem: convert memfd_secret() to FD_ADD()
  memfd: convert memfd_create() to FD_ADD()
  bpf: convert bpf_token_create() to FD_PREPARE()
  ...
2025-12-01 17:32:07 -08:00
Jakub Kicinski 4a18b6cd7c bluetooth-next pull request for net-next:
core:
 
  - HCI: Add initial support for PAST
  - hci_core: Introduce HCI_CONN_FLAG_PAST
  - ISO: Add support to bind to trigger PAST
  - HCI: Always use the identity address when initializing a connection
  - ISO: Attempt to resolve broadcast address
  - MGMT: Allow use of Set Device Flags without Add Device
  - ISO: Fix not updating BIS sender source address
  - HCI: Add support for LL Extended Feature Set
 
  driver:
 
  - btusb: Add new VID/PID 2b89/6275 for RTL8761BUV
  - btusb: MT7920: Add VID/PID 0489/e135
  - btusb: MT7922: Add VID/PID 0489/e170
  - btusb: Add new VID/PID 13d3/3533 for RTL8821CE
  - btusb: Add new VID/PID 0x0489/0xE12F for RTL8852BE-VT
  - btusb: Add new VID/PID 0x13d3/0x3618 for RTL8852BE-VT
  - btusb: Add new VID/PID 0x13d3/0x3619 for RTL8852BE-VT
  - btusb: Reclassify Qualcomm WCN6855 debug packets
  - btintel_pcie: Introduce HCI Driver protocol
  - btintel_pcie: Support for S4 (Hibernate)
  - btintel_pcie: Suspend/Resume: Controller doorbell interrupt handling
  - dt-bindings: net: Convert Marvell 8897/8997 bindings to DT schema
  - btbcm: Use kmalloc_array() to prevent overflow
  - btrtl: Add the support for RTL8761CUV
  - hci_h5: avoid sending two SYNC messages
  - hci_h5: implement CRC data integrity
 
 MAINTAINERS:
 
  - Add Bartosz Golaszewski as Qualcomm hci_qca maintainer
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmkuCjwZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKdOvD/9ReyEoUaUZ6aKC7TO66fT0
 eo/sga036O3k9oeeECsEBOOgwOy86gbR7uGol/vV6mocm/Duql/pNqMrgC28Cxhy
 u+aGX7sf3s8Pc/e82Syx6u+QPKa6xz3Hnx1UdCM/HXb0nOkJUm3VHFmshFxmXE9g
 xyMWKtLp1DrXOZNLauR/p8fgAEhGQs8muICuWT/SrEbXZ4+coQoz2h6279IA9FGZ
 2qj/pcLIBFk0qePkiaZ5LJGjF0P0+S9uX9XlhF9yIsLIckH8qAbPsHpD1+RsbkId
 R4WYxIBTVeeUhvWQezodsZJa8HRFQendCeBq8QhOo6fEcprpxrb4/8NS6hipblfy
 rTeyKUoPmPZ/5/nvx9pbRSqqdVOVT/pEOzD5o66bppX7s7Xq3k/WIRqCKpM4znIp
 iZyUlaoX6J5R+SHaOLVzRun6oUzXnIHbIGbWopfQBtMgpDajTcxQJpsTWdFQW4El
 RP+N9xoF+OBa4pKumKCe11pzUJOkBislDIAjNvp8wmevfSfkmo8CbqK5WwfL0rar
 VeBDlkfPYLVNiJ30WKwmLC4ymRvzAFhC0R6LoDPw8OWTZ7VBc7ReCcHDp8GBWbew
 pKe7jMWCCOo1q9zx0KBAwghQ6ZbABnK6N6Fg/Wpb/kfz7YkqnsRWRJkI2BTotEdu
 +bIZ2MwuJ59ROnfPy0tNBw==
 =ndah
 -----END PGP SIGNATURE-----

Merge tag 'for-net-next-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

core:

 - HCI: Add initial support for PAST
 - hci_core: Introduce HCI_CONN_FLAG_PAST
 - ISO: Add support to bind to trigger PAST
 - HCI: Always use the identity address when initializing a connection
 - ISO: Attempt to resolve broadcast address
 - MGMT: Allow use of Set Device Flags without Add Device
 - ISO: Fix not updating BIS sender source address
 - HCI: Add support for LL Extended Feature Set

 driver:

 - btusb: Add new VID/PID 2b89/6275 for RTL8761BUV
 - btusb: MT7920: Add VID/PID 0489/e135
 - btusb: MT7922: Add VID/PID 0489/e170
 - btusb: Add new VID/PID 13d3/3533 for RTL8821CE
 - btusb: Add new VID/PID 0x0489/0xE12F for RTL8852BE-VT
 - btusb: Add new VID/PID 0x13d3/0x3618 for RTL8852BE-VT
 - btusb: Add new VID/PID 0x13d3/0x3619 for RTL8852BE-VT
 - btusb: Reclassify Qualcomm WCN6855 debug packets
 - btintel_pcie: Introduce HCI Driver protocol
 - btintel_pcie: Support for S4 (Hibernate)
 - btintel_pcie: Suspend/Resume: Controller doorbell interrupt handling
 - dt-bindings: net: Convert Marvell 8897/8997 bindings to DT schema
 - btbcm: Use kmalloc_array() to prevent overflow
 - btrtl: Add the support for RTL8761CUV
 - hci_h5: avoid sending two SYNC messages
 - hci_h5: implement CRC data integrity

MAINTAINERS:

 - Add Bartosz Golaszewski as Qualcomm hci_qca maintainer

* tag 'for-net-next-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (29 commits)
  Bluetooth: btusb: Add new VID/PID 13d3/3533 for RTL8821CE
  Bluetooth: HCI: Add support for LL Extended Feature Set
  drivers/bluetooth: btbcm: Use kmalloc_array() to prevent overflow
  Bluetooth: btintel_pcie: Introduce HCI Driver protocol
  Bluetooth: btusb: add new custom firmwares
  Bluetooth: btusb: Add new VID/PID 0x13d3/0x3619 for RTL8852BE-VT
  Bluetooth: btusb: Add new VID/PID 0x13d3/0x3618 for RTL8852BE-VT
  Bluetooth: btusb: Add new VID/PID 0x0489/0xE12F for RTL8852BE-VT
  Bluetooth: iso: fix socket matching ambiguity between BIS and CIS
  Bluetooth: MAINTAINERS: Add Bartosz Golaszewski as Qualcomm hci_qca maintainer
  Bluetooth: btrtl: Add the support for RTL8761CUV
  Bluetooth: Remove redundant pm_runtime_mark_last_busy() calls
  dt-bindings: net: Convert Marvell 8897/8997 bindings to DT schema
  Bluetooth: btusb: Reclassify Qualcomm WCN6855 debug packets
  Bluetooth: btusb: Add new VID/PID 2b89/6275 for RTL8761BUV
  Bluetooth: btintel_pcie: Suspend/Resume: Controller doorbell interrupt handling
  Bluetooth: btintel_pcie: Support for S4 (Hibernate)
  Bluetooth: btusb: MT7922: Add VID/PID 0489/e170
  Bluetooth: btusb: MT7920: Add VID/PID 0489/e135
  Bluetooth: ISO: Fix not updating BIS sender source address
  ...
====================

Link: https://patch.msgid.link/20251201213818.97249-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 17:10:52 -08:00
Vladimir Oltean 0e75bfe340 net: dsa: add simple HSR offload helpers
It turns out that HSR offloads are so fine-grained that many DSA
switches can do a small part even though they weren't specifically
designed for the protocols supported by that driver (HSR and PRP).

Specifically NETIF_F_HW_HSR_DUP - it is simple packet duplication on
transmit, towards all (aka 2) ports members of the HSR device.

For many DSA switches, we know how to duplicate a packet, even though we
never typically use that feature. The transmit port mask from the
tagging protocol can have multiple bits set, and the switch should send
the packet once to every port with a bit set from that mask.

Nonetheless, not all tagging protocols are like this, and sometimes the
port is a single numeric value rather than a bit mask. For that reason,
and also because switches can sometimes change tagging protocols for
different ones, we need to make HSR offload helpers opt-in.

For devices that can do nothing else HSR-specific, we introduce
dsa_port_simple_hsr_join() and dsa_port_simple_hsr_leave(). These
functions monitor when two user ports of the same switch are part of the
same HSR device, and when that condition is true, they toggle the
NETIF_F_HW_HSR_DUP feature flag of both net devices.

Normally only dsa_port_simple_hsr_join() and dsa_port_simple_hsr_leave()
are needed. The dsa_port_simple_hsr_validate() helper is just to see
what kind of configuration could be offloadable using the generic
helpers. This is used by switch drivers which are not currently using
the right tagging protocol to offload this HSR ring, but could in
principle offload it after changing the tagger.

Suggested-by: David Yang <mmyangfl@gmail.com>
Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk>
Cc: Chester A. Unal" <chester.a.unal@arinc9.com>
Cc: "Clément Léger" <clement.leger@bootlin.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Jonas Gorski <jonas.gorski@gmail.com>
Cc: Kurt Kanzenbach <kurt@linutronix.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: UNGLinuxDriver@microchip.com
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251130131657.65080-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 16:45:07 -08:00
Vladimir Oltean bed59a86e9 net: dsa: avoid calling ds->ops->port_hsr_leave() when unoffloaded
This mirrors what we do in dsa_port_lag_leave() and
dsa_port_bridge_leave(): when ds->ops->port_hsr_join() returns
-EOPNOTSUPP, we fall back to a software implementation where dp->hsr_dev
is NULL, and the unoffloaded port is no longer bothered with calls from
the HSR layer.

This helps, for example, with interlink ports which current DSA drivers
don't know how to offload. We have to check only in port_hsr_join() for
the port type, then in port_hsr_leave() we are sure we're dealing only
with known port types.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251130131657.65080-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 16:45:06 -08:00
Xiaoliang Yang a0244e7621 net: hsr: create an API to get hsr port type
Since the introduction of HSR_PT_INTERLINK in commit 5055cccfc2 ("net:
hsr: Provide RedBox support (HSR-SAN)"), we see that different port
types require different settings for hardware offload, which was not the
case before when we only had HSR_PT_SLAVE_A and HSR_PT_SLAVE_B. But
there is currently no way to know which port is which type, so create
the hsr_get_port_type() API function and export it.

When hsr_get_port_type() is called from the device driver, the port can
must be found in the HSR port list. An important use case is for this
function to work from offloading drivers' NETDEV_CHANGEUPPER handler,
which is triggered by hsr_portdev_setup() -> netdev_master_upper_dev_link().
Therefore, we need to move the addition of the hsr_port to the HSR port
list prior to calling hsr_portdev_setup(). This makes the error
restoration path also more similar to hsr_del_port(), where
kfree_rcu(port) is already used.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Lukasz Majewski <lukma@denx.de>
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Łukasz Majewski <lukma@nabladev.com>
Link: https://patch.msgid.link/20251130131657.65080-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 16:45:06 -08:00
Linus Torvalds db74a7d02a vfs-6.19-rc1.directory.delegations
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZgAKCRCRxhvAZXjc
 ooiEAPwNZfkqiSs6G1B2EmjFpMrA2BDqskaOsnN2sywra0sNewD9EQxJwlYXUn+z
 nNUIAvmegJGg2OiU2UaNGwxMR3lR3w8=
 =YELr
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull directory delegations update from Christian Brauner:
 "This contains the work for recall-only directory delegations for
  knfsd.

  Add support for simple, recallable-only directory delegations. This
  was decided at the fall NFS Bakeathon where the NFS client and server
  maintainers discussed how to merge directory delegation support.

  The approach starts with recallable-only delegations for several reasons:

   1. RFC8881 has gaps that are being addressed in RFC8881bis. In
      particular, it requires directory position information for
      CB_NOTIFY callbacks, which is difficult to implement properly
      under Linux. The spec is being extended to allow that information
      to be omitted.

   2. Client-side support for CB_NOTIFY still lags. The client side
      involves heuristics about when to request a delegation.

   3. Early indication shows simple, recallable-only delegations can
      help performance. Anna Schumaker mentioned seeing a multi-minute
      speedup in xfstests runs with them enabled.

  With these changes, userspace can also request a read lease on a
  directory that will be recalled on conflicting accesses. This may be
  useful for applications like Samba. Users can disable leases
  altogether via the fs.leases-enable sysctl if needed.

  VFS changes:

   - Dedicated Type for Delegations

     Introduce struct delegated_inode to track inodes that may have
     delegations that need to be broken. This replaces the previous
     approach of passing raw inode pointers through the delegation
     breaking code paths, providing better type safety and clearer
     semantics for the delegation machinery.

   - Break parent directory delegations in open(..., O_CREAT) codepath

   - Allow mkdir to wait for delegation break on parent

   - Allow rmdir to wait for delegation break on parent

   - Add try_break_deleg calls for parents to vfs_link(), vfs_rename(),
     and vfs_unlink()

   - Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations
     on parent directory

   - Clean up argument list for vfs_create()

   - Expose delegation support to userland

  Filelock changes:

   - Make lease_alloc() take a flags argument

   - Rework the __break_lease API to use flags

   - Add struct delegated_inode

   - Push the S_ISREG check down to ->setlease handlers

   - Lift the ban on directory leases in generic_setlease

  NFSD changes:

   - Allow filecache to hold S_IFDIR files

   - Allow DELEGRETURN on directories

   - Wire up GET_DIR_DELEGATION handling

  Fixes:

   - Fix kernel-doc warnings in __fcntl_getlease

   - Add needed headers for new struct delegation definition"

* tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: add needed headers for new struct delegation definition
  filelock: __fcntl_getlease: fix kernel-doc warnings
  vfs: expose delegation support to userland
  nfsd: wire up GET_DIR_DELEGATION handling
  nfsd: allow DELEGRETURN on directories
  nfsd: allow filecache to hold S_IFDIR files
  filelock: lift the ban on directory leases in generic_setlease
  vfs: make vfs_symlink break delegations on parent dir
  vfs: make vfs_mknod break delegations on parent directory
  vfs: make vfs_create break delegations on parent directory
  vfs: clean up argument list for vfs_create()
  vfs: break parent dir delegations in open(..., O_CREAT) codepath
  vfs: allow rmdir to wait for delegation break on parent
  vfs: allow mkdir to wait for delegation break on parent
  vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink}
  filelock: push the S_ISREG check down to ->setlease handlers
  filelock: add struct delegated_inode
  filelock: rework the __break_lease API to use flags
  filelock: make lease_alloc() take a flags argument
2025-12-01 15:34:41 -08:00
Jeremy Kerr 6ab578739a net: mctp: test: move TX packetqueue from dst to dev
To capture TX packets during a test, we are currently intercepting the
dst->output with an implementation that adds the transmitted packet to
a skb queue attached to the test-specific mock dst. The netdev itself is
not involved in the test TX path.

Instead, we can just use our test device to stash TXed packets for later
inspection by the test. This means we can include the actual
mctp_dst_output() implementation in the test (by setting dst.output in
the test case), and don't need to be creating fake dst objects, or their
corresponding skb queues.

We need to ensure that the netdev is up to allow delivery to
ndo_start_xmit, but the tests assume active devices at present anyway.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20251126-dev-mctp-test-tx-queue-v2-1-4e5bbd1d6c57@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 13:52:13 -08:00
Linus Torvalds 1d18101a64 kernel-6.19-rc1.cred
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
 orJLAP9UD+dX6cicJDkzFZowDakmoIQkR5ZSDwChSlmvLcmquwEAlSq4svVd9Bdl
 7kOFUk71DqhVHrPAwO7ap0BxehokEAA=
 =Cli6
 -----END PGP SIGNATURE-----

Merge tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull cred guard updates from Christian Brauner:
 "This contains substantial credential infrastructure improvements
  adding guard-based credential management that simplifies code and
  eliminates manual reference counting in many subsystems.

  Features:

   - Kernel Credential Guards

     Add with_kernel_creds() and scoped_with_kernel_creds() guards that
     allow using the kernel credentials without allocating and copying
     them. This was requested by Linus after seeing repeated
     prepare_kernel_creds() calls that duplicate the kernel credentials
     only to drop them again later.

     The new guards completely avoid the allocation and never expose the
     temporary variable to hold the kernel credentials anywhere in
     callers.

   - Generic Credential Guards

     Add scoped_with_creds() guards for the common override_creds() and
     revert_creds() pattern. This builds on earlier work that made
     override_creds()/revert_creds() completely reference count free.

   - Prepare Credential Guards

     Add prepare credential guards for the more complex pattern of
     preparing a new set of credentials and overriding the current
     credentials with them:
      - prepare_creds()
      - modify new creds
      - override_creds()
      - revert_creds()
      - put_cred()

  Cleanups:

   - Make init_cred static since it should not be directly accessed

   - Add kernel_cred() helper to properly access the kernel credentials

   - Fix scoped_class() macro that was introduced two cycles ago

   - coredump: split out do_coredump() from vfs_coredump() for cleaner
     credential handling

   - coredump: move revert_cred() before coredump_cleanup()

   - coredump: mark struct mm_struct as const

   - coredump: pass struct linux_binfmt as const

   - sev-dev: use guard for path"

* tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
  trace: use override credential guard
  trace: use prepare credential guard
  coredump: use override credential guard
  coredump: use prepare credential guard
  coredump: split out do_coredump() from vfs_coredump()
  coredump: mark struct mm_struct as const
  coredump: pass struct linux_binfmt as const
  coredump: move revert_cred() before coredump_cleanup()
  sev-dev: use override credential guards
  sev-dev: use prepare credential guard
  sev-dev: use guard for path
  cred: add prepare credential guard
  net/dns_resolver: use credential guards in dns_query()
  cgroup: use credential guards in cgroup_attach_permissions()
  act: use credential guards in acct_write_process()
  smb: use credential guards in cifs_get_spnego_key()
  nfs: use credential guards in nfs_idmap_get_key()
  nfs: use credential guards in nfs_local_call_write()
  nfs: use credential guards in nfs_local_call_read()
  erofs: use credential guards
  ...
2025-12-01 13:45:41 -08:00
Luiz Augusto von Dentz a106e50be7 Bluetooth: HCI: Add support for LL Extended Feature Set
This adds support for emulating LL Extended Feature Set introduced in 6.0
that adds the following:

Commands:

 - HCI_LE_Read_All_Local_Supported_­Features(0x2087)(Feature:47,1)
 - HCI_LE_Read_All_Remote_Features(0x2088)(Feature:47,2)

Events:

 - HCI_LE_Read_All_Remote_Features_Complete(0x2b)(Mask bit:42)

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-12-01 16:21:16 -05:00
Yang Li 56f765ce73 Bluetooth: iso: fix socket matching ambiguity between BIS and CIS
When both BIS and CIS links exist, their sockets are in
the BT_LISTEN state.
dump sock:
  sk 000000001977ef51 state 6
  src 10:a5:62:31:05:cf dst 00:00:00:00:00:00
  sk 0000000031d28700 state 7
  src 10:a5:62:31:05:cf dst00:00:00:00:00:00
  sk 00000000613af00e state 4   # listen sock of bis
  src 10:a5:62:31:05:cf dst 54:00:00:d4:99:30
  sk 000000001710468c state 9
  src 10:a5:62:31:05:cf dst 54:00:00:d4:99:30
  sk 000000005d97dfde state 4   #listen sock of cis
  src 10:a5:62:31:05:cf dst 00:00:00:00:00:00

To locate the CIS socket correctly, check both the BT_LISTEN
state and whether dst addr is BDADDR_ANY.

Link: https://github.com/bluez/bluez/issues/1224
Signed-off-by: Yang Li <yang.li@amlogic.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-12-01 16:00:07 -05:00
Luiz Augusto von Dentz 577cf4c0a1 Bluetooth: ISO: Fix not updating BIS sender source address
The source address for a BIS sender/Broadcast Source shall be updated
with the advertisement address since in case privacy is enabled it may
use an RPA rather than an identity address.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-12-01 16:00:06 -05:00
Luiz Augusto von Dentz a3b76bf4c4 Bluetooth: MGMT: Allow use of Set Device Flags without Add Device
In certain cases setting devices flags like HCI_CONN_FLAG_PAST it
shouldn't require to do Add Device first since it may not need to add
an auto-connect policy, so this instead just automatically creates
a hci_conn_params if one cannot be found using HCI_AUTO_CONN_DISABLED.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-12-01 16:00:06 -05:00
Luiz Augusto von Dentz f817db10dc Bluetooth: ISO: Attempt to resolve broadcast address
Broadcasters maybe using RPAs which can change over time and not
matching the address used as destination in the socket, so this
attempts to resolve the addresses then match with the socket
address, in case that uses an indentity address, or then match the
IRKs if both broadcaster and socket are using RPAs.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-12-01 16:00:06 -05:00
Luiz Augusto von Dentz 14b06c3a88 Bluetooth: HCI: Always use the identity address when initializing a connection
This makes sure hci_conn is initialized with the identity address if
a matching IRK exists which avoids the trouble of having to do it at
multiple places which seems to be missing (e.g. CIS, BIS and PA).

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-12-01 16:00:06 -05:00
Luiz Augusto von Dentz d3413703d5 Bluetooth: ISO: Add support to bind to trigger PAST
This makes it possible to bind to a different destination address
after being connected (BT_CONNECTED, BT_CONNECT2) which then triggers
PAST Sender proceedure to transfer the PA Sync to the destination
address.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-12-01 16:00:04 -05:00
Luiz Augusto von Dentz c530569adc Bluetooth: hci_core: Introduce HCI_CONN_FLAG_PAST
This introduces a new device flag so userspace can indicate if it
wants to enable PAST Receiver for a specific device.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-12-01 15:58:54 -05:00
Luiz Augusto von Dentz 33b2835f0b Bluetooth: HCI: Add initial support for PAST
This adds PAST related commands (HCI_OP_LE_PAST,
HCI_OP_LE_PAST_SET_INFO and HCI_OP_LE_PAST_PARAMS) and events
(HCI_EV_LE_PAST_RECEIVED) along with handling of PAST sender and
receiver features bits including new MGMG settings (
HCI_EV_LE_PAST_RECEIVED and MGMT_SETTING_PAST_RECEIVER) which
userspace can use to determine if PAST is supported by the
controller.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-12-01 15:58:54 -05:00
Alok Tiwari 09339d0d83 l2tp: correct debugfs label for tunnel tx stats
l2tp_dfs_seq_tunnel_show prints two groups of tunnel statistics. The
first group reports transmit counters, but the code labels it as rx.
Set the label to "tx" so the debugfs output reflects the actual meaning.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251128085300.3377210-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 12:03:09 -08:00
Linus Torvalds 415d34b92c namespace-6.19-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
 ooKwAP4kR5kMjHlthf8jHmmCjVU3nQFO9hUZsIQL9gFJLOIQMAD+LLoTaq1WJufl
 oSgZpREXZVmI1TK61eR6EZMB1YikGAo=
 =TExi
 -----END PGP SIGNATURE-----

Merge tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull namespace updates from Christian Brauner:
 "This contains substantial namespace infrastructure changes including a new
  system call, active reference counting, and extensive header cleanups.
  The branch depends on the shared kbuild branch for -fms-extensions support.

  Features:

   - listns() system call

     Add a new listns() system call that allows userspace to iterate
     through namespaces in the system. This provides a programmatic
     interface to discover and inspect namespaces, addressing
     longstanding limitations:

     Currently, there is no direct way for userspace to enumerate
     namespaces. Applications must resort to scanning /proc/*/ns/ across
     all processes, which is:
      - Inefficient - requires iterating over all processes
      - Incomplete - misses namespaces not attached to any running
        process but kept alive by file descriptors, bind mounts, or
        parent references
      - Permission-heavy - requires access to /proc for many processes
      - No ordering or ownership information
      - No filtering per namespace type

     The listns() system call solves these problems:

       ssize_t listns(const struct ns_id_req *req, u64 *ns_ids,
                      size_t nr_ns_ids, unsigned int flags);

       struct ns_id_req {
             __u32 size;
             __u32 spare;
             __u64 ns_id;
             struct /* listns */ {
                     __u32 ns_type;
                     __u32 spare2;
                     __u64 user_ns_id;
             };
       };

     Features include:
      - Pagination support for large namespace sets
      - Filtering by namespace type (MNT_NS, NET_NS, USER_NS, etc.)
      - Filtering by owning user namespace
      - Permission checks respecting namespace isolation

   - Active Reference Counting

     Introduce an active reference count that tracks namespace
     visibility to userspace. A namespace is visible in the following
     cases:
      - The namespace is in use by a task
      - The namespace is persisted through a VFS object (namespace file
        descriptor or bind-mount)
      - The namespace is a hierarchical type and is the parent of child
        namespaces

     The active reference count does not regulate lifetime (that's still
     done by the normal reference count) - it only regulates visibility
     to namespace file handles and listns().

     This prevents resurrection of namespaces that are pinned only for
     internal kernel reasons (e.g., user namespaces held by
     file->f_cred, lazy TLB references on idle CPUs, etc.) which should
     not be accessible via (1)-(3).

   - Unified Namespace Tree

     Introduce a unified tree structure for all namespaces with:
      - Fixed IDs assigned to initial namespaces
      - Lookup based solely on inode number
      - Maintained list of owned namespaces per user namespace
      - Simplified rbtree comparison helpers

   Cleanups

    - Header Reorganization:
      - Move namespace types into separate header (ns_common_types.h)
      - Decouple nstree from ns_common header
      - Move nstree types into separate header
      - Switch to new ns_tree_{node,root} structures with helper functions
      - Use guards for ns_tree_lock

   - Initial Namespace Reference Count Optimization
      - Make all reference counts on initial namespaces a nop to avoid
        pointless cacheline ping-pong for namespaces that can never go
        away
      - Drop custom reference count initialization for initial namespaces
      - Add NS_COMMON_INIT() macro and use it for all namespaces
      - pid: rely on common reference count behavior

   - Miscellaneous Cleanups
      - Rename exit_task_namespaces() to exit_nsproxy_namespaces()
      - Rename is_initial_namespace() and make argument const
      - Use boolean to indicate anonymous mount namespace
      - Simplify owner list iteration in nstree
      - nsfs: raise SB_I_NODEV, SB_I_NOEXEC, and DCACHE_DONTCACHE explicitly
      - nsfs: use inode_just_drop()
      - pidfs: raise DCACHE_DONTCACHE explicitly
      - pidfs: simplify PIDFD_GET__NAMESPACE ioctls
      - libfs: allow to specify s_d_flags
      - cgroup: add cgroup namespace to tree after owner is set
      - nsproxy: fix free_nsproxy() and simplify create_new_namespaces()

  Fixes:

   - setns(pidfd, ...) race condition

     Fix a subtle race when using pidfds with setns(). When the target
     task exits after prepare_nsset() but before commit_nsset(), the
     namespace's active reference count might have been dropped. If
     setns() then installs the namespaces, it would bump the active
     reference count from zero without taking the required reference on
     the owner namespace, leading to underflow when later decremented.

     The fix resurrects the ownership chain if necessary - if the caller
     succeeded in grabbing passive references, the setns() should
     succeed even if the target task exits or gets reaped.

   - Return EFAULT on put_user() error instead of success

   - Make sure references are dropped outside of RCU lock (some
     namespaces like mount namespace sleep when putting the last
     reference)

   - Don't skip active reference count initialization for network
     namespace

   - Add asserts for active refcount underflow

   - Add asserts for initial namespace reference counts (both passive
     and active)

   - ipc: enable is_ns_init_id() assertions

   - Fix kernel-doc comments for internal nstree functions

   - Selftests
      - 15 active reference count tests
      - 9 listns() functionality tests
      - 7 listns() permission tests
      - 12 inactive namespace resurrection tests
      - 3 threaded active reference count tests
      - commit_creds() active reference tests
      - Pagination and stress tests
      - EFAULT handling test
      - nsid tests fixes"

* tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (103 commits)
  pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls
  nstree: fix kernel-doc comments for internal functions
  nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
  selftests/namespaces: fix nsid tests
  ns: drop custom reference count initialization for initial namespaces
  pid: rely on common reference count behavior
  ns: add asserts for initial namespace active reference counts
  ns: add asserts for initial namespace reference counts
  ns: make all reference counts on initial namespace a nop
  ipc: enable is_ns_init_id() assertions
  fs: use boolean to indicate anonymous mount namespace
  ns: rename is_initial_namespace()
  ns: make is_initial_namespace() argument const
  nstree: use guards for ns_tree_lock
  nstree: simplify owner list iteration
  nstree: switch to new structures
  nstree: add helper to operate on struct ns_tree_{node,root}
  nstree: move nstree types into separate header
  nstree: decouple from ns_common header
  ns: move namespace types into separate header
  ...
2025-12-01 09:47:41 -08:00
Oliver Hartkopp cb2dc6d286 can: Kconfig: select CAN driver infrastructure by default
The CAN bus support enabled with CONFIG_CAN provides a socket-based
access to CAN interfaces. With the introduction of the latest CAN protocol
CAN XL additional configuration status information needs to be exposed to
the network layer than formerly provided by standard Linux network drivers.

This requires the CAN driver infrastructure to be selected by default.
As the CAN network layer can only operate on CAN interfaces anyway all
distributions and common default configs enable at least one CAN driver.

So selecting CONFIG_CAN_DEV when CONFIG_CAN is selected by the user has
no effect on established configurations but solves potential build issues
when CONFIG_CAN[_XXX]=y is set together with CANFIG_CAN_DEV=m

Fixes: 1a620a7238 ("can: raw: instantly reject unsupported CAN frames")
Reported-by: Vincent Mailhol <mailhol@kernel.org>
Closes: https://lore.kernel.org/all/CAMZ6RqL_nGszwoLPXn1Li8op-ox4k3Hs6p=Hw6+w0W=DTtobPw@mail.gmail.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511280531.YnWW2Rxc-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511280842.djCQ0N0O-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511282325.uVQFRTkA-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511291520.guIE1QHj-lkp@intel.com/
Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20251129090500.17484-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-29 13:37:12 +01:00
Thorsten Blum ff736a2861 net: ipconfig: Replace strncpy with strscpy in ic_proto_name
strncpy() is deprecated [1] for NUL-terminated destination buffers
because it does not guarantee NUL termination. Replace it with strscpy()
to ensure the destination buffer is always NUL-terminated and to avoid
any additional NUL padding.

Although the identifier buffer has 252 usable bytes, strncpy() copied
only up to 251 bytes to the zero-initialized buffer, relying on the last
byte to act as an implicit NUL terminator. Switching to strscpy() avoids
this implicit behavior and does not use magic numbers.

The source string is also NUL-terminated and satisfies the
__must_be_cstr() requirement of strscpy().

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20251126220804.102160-2-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:19:16 -08:00
Breno Leitao e5235eb6cf net: netpoll: initialize work queue before error checks
Prevent a kernel warning when netconsole setup fails on devices with
IFF_DISABLE_NETPOLL flag. The warning (at kernel/workqueue.c:4242 in
__flush_work) occurs because the cleanup path tries to cancel an
uninitialized work queue.

When __netpoll_setup() encounters a device with IFF_DISABLE_NETPOLL,
it fails early and calls skb_pool_flush() for cleanup. This function
calls cancel_work_sync(&np->refill_wq), but refill_wq hasn't been
initialized yet, triggering the warning.

Move INIT_WORK() to the beginning of __netpoll_setup(), ensuring the
work queue is properly initialized before any potential failure points.
This allows the cleanup path to safely cancel the work queue regardless
of where the setup fails.

Fixes: 248f6571fd ("netpoll: Optimize skb refilling on critical path")
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251127-netpoll_fix_init_work-v1-1-65c07806d736@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:16:57 -08:00
Jakub Kicinski 840a64710e netfilter pull request 25-11-28
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmko6jsACgkQ1w0aZmrP
 KyGIGBAAkmLtKNMnouv2eOJjJb50ERQ1cYvKG3zSI5GrOnkYvfS3MfU5rLuBR/ee
 L/xRpgNZdXMFAu1nkpFbNIoSwpOe3JaUuizlzLwTRYRmtZeRlGfvzDqiY4CDYKU1
 7gBP0EMeTeF0SJRntU6S+zoTY7Xru5w40u5wVTnm0etiigwklv4EgixnzuSLSdkz
 Av3KLE0BN85cNs6onZ6s4N4dEpIyQ7Ln0imdFiJOLvg42lM6uVNfXB6CxUIo/tIC
 VzY9vQ5rTfhcNx3lRbaJaDOE6k01x+RsBM15AkkAlafLMfvRIH4zK9qiV9tfT6c+
 t7md70+7w6j7zB9sXuI1tSMOCMvtxYfB49RJVomasEJ8J7VZ+x/7vaFYSfvydEVb
 hy1v9jOuViWWCEQhswLwQw/Xl42MVCE/zReHHBAxIC+I7nAZgEYqOCtYYPex3gZq
 l5gfiJhWqdg5yOuQepZkNo5TaFbkANgFcDuUp8IfWsbwZ2xdIIqIbHVNmenr0UuS
 4ml+t8is/rsLi/gHoKfmfbG64wG1reVcRpVxWQljr9ePkg+04fRtesaOG44k/R+i
 wdUxHL4D4WV2SnNHznw8J12tgbsIc/VgwU0EFEUxUahc18quxaumZTVuL7enbFw1
 3qgN+9qQ5ONDuABR9fedFGIoCFmOVkZLXgJnLgTC7bbZ6v0GvSM=
 =exaR
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-25-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next:

0) Add sanity check for maximum encapsulations in bridge vlan,
   reported by the new AI robot.

1) Move the flowtable path discovery code to its own file, the
   nft_flow_offload.c mixes the nf_tables evaluation with the path
   discovery logic, just split this in two for clarity.

2) Consolidate flowtable xmit path by using dev_queue_xmit() and the
   real device behind the layer 2 vlan/pppoe device. This allows to
   inline encapsulation. After this update, hw_ifidx can be removed
   since both ifidx and hw_ifidx now point to the same device.

3) Support for IPIP encapsulation in the flowtable, extend selftest
   to cover for this new layer 3 offload, from Lorenzo Bianconi.

4) Push down the skb into the conncount API to fix duplicates in the
   conncount list for packets with non-confirmed conntrack entries,
   this is due to an optimization introduced in d265929930
   ("netfilter: nf_conncount: reduce unnecessary GC").
   From Fernando Fernandez Mancera.

5) In conncount, disable BH when performing garbage collection
   to consolidate existing behaviour in the conncount API, also
   from Fernando.

6) A matching packet with a confirmed conntrack invokes GC if
   conncount reaches the limit in an attempt to release slots.
   This allows the existing extensions to be used for real conntrack
   counting, not just limiting new connections, from Fernando.

7) Support for updating ct count objects in nf_tables, from Fernando.

8) Extend nft_flowtables.sh selftest to send IPv6 TCP traffic,
   from Lorenzo Bianconi.

9) Fixes for UAPI kernel-doc documentation, from Randy Dunlap.

* tag 'nf-next-25-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: improve UAPI kernel-doc comments
  netfilter: ip6t_srh: fix UAPI kernel-doc comments format
  selftests: netfilter: nft_flowtable.sh: Add the capability to send IPv6 TCP traffic
  netfilter: nft_connlimit: add support to object update operation
  netfilter: nft_connlimit: update the count if add was skipped
  netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH
  netfilter: nf_conncount: rework API to use sk_buff directly
  selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest
  netfilter: flowtable: Add IPIP tx sw acceleration
  netfilter: flowtable: Add IPIP rx sw acceleration
  netfilter: flowtable: use tuple address to calculate next hop
  netfilter: flowtable: remove hw_ifidx
  netfilter: flowtable: inline pppoe encapsulation in xmit path
  netfilter: flowtable: inline vlan encapsulation in xmit path
  netfilter: flowtable: consolidate xmit path
  netfilter: flowtable: move path discovery infrastructure to its own file
  netfilter: flowtable: check for maximum number of encapsulations in bridge vlan
====================

Link: https://patch.msgid.link/20251128002345.29378-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:08:39 -08:00
Vladimir Oltean 64b0d2edb6 net: dsa: tag_yt921x: use the dsa_xmit_port_mask() helper
The "yt921x" tagging protocol populates a bit mask for the TX ports,
so we can use dsa_xmit_port_mask() to centralize the decision of how to
set that field.

Cc: David Yang <mmyangfl@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-16-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:41 -08:00
Vladimir Oltean 24099389a6 net: dsa: tag_xrs700x: use the dsa_xmit_port_mask() helper
The "xrs700x" is the original DSA tagging protocol with HSR TX
replication support, we now essentially move that logic to the
dsa_xmit_port_mask() helper. The end result is something akin to
hellcreek_xmit() (but reminds me I should also take care of
skb_checksum_help() for tail taggers in the core).

The implementation differences to dsa_xmit_port_mask() are immaterial.

Cc: George McCollister <george.mccollister@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-15-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:41 -08:00
Vladimir Oltean 3c1975bbdf net: dsa: tag_trailer: use the dsa_xmit_port_mask() helper
The "trailer" tagging protocol populates a bit mask for the TX ports, so
we can use dsa_xmit_port_mask() to centralize the decision of how to set
that field.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-14-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:41 -08:00
Vladimir Oltean b33aa90e68 net: dsa: tag_rzn1_a5psw: use the dsa_xmit_port_mask() helper
The "a5psw" tagging protocol populates a bit mask for the TX ports,
so we can use dsa_xmit_port_mask() to centralize the decision of how to
set that field.

Cc: "Clément Léger" <clement.leger@bootlin.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-13-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:41 -08:00
Vladimir Oltean 5afe4ccc33 net: dsa: tag_rtl8_4: use the dsa_xmit_port_mask() helper
The "rtl8_4" and "rtl8_4t" tagging protocols populate a bit mask for the
TX ports, so we can use dsa_xmit_port_mask() to centralize the decision
of how to set that field.

Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-12-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:40 -08:00
Vladimir Oltean 4abf39c8ae net: dsa: tag_rtl4_a: use the dsa_xmit_port_mask() helper
The "rtl4a" tagging protocol populates a bit mask for the TX ports,
so we can use dsa_xmit_port_mask() to centralize the decision of how to
set that field.

Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-11-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:40 -08:00
Vladimir Oltean 48afabaf4a net: dsa: tag_qca: use the dsa_xmit_port_mask() helper
The "qca" tagging protocol populates a bit mask for the TX ports, so we
can use dsa_xmit_port_mask() to centralize the decision of how to set
that field.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-10-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:40 -08:00
Vladimir Oltean 5733fe2a7a net: dsa: tag_ocelot: use the dsa_xmit_port_mask() helper
The "ocelot" and "seville" tagging protocols populate a bit mask for the
TX ports, so we can use dsa_xmit_port_mask() to centralize the decision
of how to set that field.

This protocol used BIT_ULL() rather than simple BIT() to silence Smatch,
as explained in commit 1f778d500d ("net: mscc: ocelot: avoid type
promotion when calling ocelot_ifh_set_dest"). I would expect that this
tool no longer complains now, when the BIT(dp->index) is hidden inside
the dsa_xmit_port_mask() function, the return value of which is promoted
to u64.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-9-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:40 -08:00
Vladimir Oltean a4a00d9e36 net: dsa: tag_mxl_gsw1xx: use the dsa_xmit_port_mask() helper
The "gsw1xx" tagging protocol populates a bit mask for the TX ports, so
we can use dsa_xmit_port_mask() to centralize the decision of how to set
that field.

Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-8-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:40 -08:00
Vladimir Oltean 84a60bbec5 net: dsa: tag_mtk: use the dsa_xmit_port_mask() helper
The "mtk" tagging protocol populates a bit mask for the TX ports, so we
can use dsa_xmit_port_mask() to centralize the decision of how to set
that field.

Cc: Chester A. Unal" <chester.a.unal@arinc9.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:40 -08:00
Vladimir Oltean ea659a9292 net: dsa: tag_ksz: use the dsa_xmit_port_mask() helper
The "ksz8795", "ksz9893", "ksz9477" and "lan937x" tagging protocols
populate a bit mask for the TX ports.

Unlike the others, "ksz9477" also accelerates HSR packet duplication.

Make the HSR duplication logic available generically to all 4 taggers by
using the dsa_xmit_port_mask() function to set the TX port mask.

Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:39 -08:00
Vladimir Oltean f59e44cc0d net: dsa: tag_hellcreek: use the dsa_xmit_port_mask() helper
The "hellcreek" tagging protocol populates a bit mask for the TX ports,
so we can use dsa_xmit_port_mask() to centralize the decision of how to
set that field.

Cc: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:39 -08:00
Vladimir Oltean e094428fb4 net: dsa: tag_gswip: use the dsa_xmit_port_mask() helper
The "gswip" tagging protocol populates a bit mask for the TX ports, so
we can use dsa_xmit_port_mask() to centralize the decision of how to set
that field.

Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:39 -08:00
Vladimir Oltean 621d06a40e net: dsa: tag_brcm: use the dsa_xmit_port_mask() helper
The "brcm" and "brcm-prepend" tagging protocols populate a bit mask for
the TX ports, so we can use dsa_xmit_port_mask() to centralize the
decision of how to set that field. The port mask is written u8 by u8,
first the high octet and then the low octet.

Cc: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:39 -08:00
Vladimir Oltean 6f2e1c75bc net: dsa: introduce the dsa_xmit_port_mask() tagging protocol helper
Many tagging protocols deal with the transmit port mask being a bit
mask, and set it to BIT(dp->index). Not a big deal.

Also, some tagging protocols are written for switches which support HSR
offload (including packet duplication offload), there we see a walk
using dsa_hsr_foreach_port() to find the other port in the same switch
that's member of the HSR, and set that bit in the port mask too.

That isn't sufficiently interesting either, until you come to realize
that there isn't anything special in the second case that switches just
in the first one can't do too.

It just becomes a matter of "is it wise to do it? are sufficient people
using HSR/PRP with generic off-the-shelf switches to justify add an
extra test in the data path?" - the answer to which is probably "it
depends". It isn't _much_ worse to not have HSR offload at all, so as to
make it impractical, esp. with a rich OS like Linux. But the HSR users
are rather specialized in industrial networking.

Anyway, the change acts on the premise that we're going to have support
for this, it should be uniformly implemented for everyone, and that if
we find some sort of balance, we can keep everyone relatively happy.

So I've disabled that logic if CONFIG_HSR isn't enabled, and I've tilted
the branch predictor to say it's unlikely we're transmitting through a
port with this capability currently active. On branch miss, we're still
going to save the transmission of one packet, so there's some remaining
benefit there too. I don't _think_ we need to jump to static keys yet.

The helper returns a 32-bit zero-based unsigned number, that callers
have to transpose using FIELD_PREP(). It is not the first time we assume
DSA switches won't be larger than 32 ports - dsa_user_ports() has that
assumption baked into it too.

One last development note about why pass the "skb" argument when this
isn't used. Looking at the compiled code on arm64, which is identical
both with and without it, the answer is "why not?" - who knows what
other features dependent on the skb may be handled in the future.

Link: https://lore.kernel.org/netdev/20251126093240.2853294-4-mmyangfl@gmail.com/
Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk>
Cc: Chester A. Unal" <chester.a.unal@arinc9.com>
Cc: "Clément Léger" <clement.leger@bootlin.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: David Yang <mmyangfl@gmail.com>
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Jonas Gorski <jonas.gorski@gmail.com>
Cc: Kurt Kanzenbach <kurt@linutronix.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: UNGLinuxDriver@microchip.com
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 20:03:39 -08:00
Jakub Kicinski 2c80116b50 Apart from the usual small things just driver updates:
- mt76:
    - WED support for >32-bit DMA
    - airoha NPU support
    - regdomain improvements
    - continued WiFi7/MLO work
  - rtw89
    - support USB devices RTL8852AU and RTL8852CU
    - initial work for RTL8922DE
    - improved injection support
  - rtl8xxxu: 40 MHz connection fixes/support
  - brcmfmac: Acer A1 840 tablet quirk
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmkoKZoACgkQ10qiO8sP
 aADi1g//e4/kTZzl8j09V/PU+2xPQ6dqNBwsYjwowl4CPusWEJqny0M5nOs9F1ob
 5lpVY3rMl4S6D5yUHY9B1fBkAgj3xuky4Udm0KONpwiGMexIn1CjlND5Qa2XW2fz
 BaHMoCI6RXzdgQoQqWQNtyxvstsb5PXfAE8h3avO+uoFhfm9zdvmWLw4cjgy76qo
 YAcUhwgIntc3oouDajOahwnxNDR2ZmZ+ATwDsmoqzhOtvTLoARnZD4+tLD1VFe+L
 yW3FdrbQYYOAdRyYiIcCIiLfr9AvqeEluCYy06J4Viafkf8io84IgijTuxM8tHpp
 spA0RA0LWNwcaYG6xf07VwjwbuhnhJEZEAfapEqhF7R6zcH7ZA6Y3vLB9JhB9bPX
 UnOb+kLrqiwnKHyHbcyW8uVFPj4D9vl9xDKM0wGCKFrv14q4Cwy/uIWW5Vy6GJnh
 Iyft0RxG83jU4x3uSx9Ywss/ByfhBuRChrBpy3ud6hf5D5dtbnH2310kBvmNMla0
 G+y2/EDjmC4uFAglKS7CwoYHYE6KJclg1hxX8jZoKq5EoKBT+n7/uM8a6vDa5lW5
 l1Sa3nJHfHHbQCQKc8jSHoC543rYMid36bJpUtnaWse35cN7bQI2v1EFuOmQD4zO
 W/OSJnH7roGUaFu9k76cCQjXWgF4NEgmRGlnvSrPEJ5YnPDJ6VU=
 =7c9f
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2025-11-27' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Apart from the usual small things just driver updates:
 - mt76:
   - WED support for >32-bit DMA
   - airoha NPU support
   - regdomain improvements
   - continued WiFi7/MLO work
 - rtw89
   - support USB devices RTL8852AU and RTL8852CU
   - initial work for RTL8922DE
   - improved injection support
 - rtl8xxxu: 40 MHz connection fixes/support
 - brcmfmac: Acer A1 840 tablet quirk

* tag 'wireless-next-2025-11-27' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (152 commits)
  wifi: mac80211: allow sharing identical chanctx for S1G interfaces
  wifi: nl80211: vendor-cmd: intel: fix a blank kernel-doc line warning
  wifi: cfg80211: include s1g_primary_2mhz when comparing chandefs
  wifi: cfg80211: include s1g_primary_2mhz when sending chandef
  wifi: ieee80211: correct FILS status codes
  mt76: mt7615: Fix memory leak in mt7615_mcu_wtbl_sta_add()
  wifi: mt76: mt792x: fix wifi init fail by setting MCU_RUNNING after CLC load
  wifi: mt76: Strip whitespace from build ddate
  wifi: mt76: mt7996: Add missing locking in mt7996_mac_sta_rc_work()
  wifi: mt76: mt7996: skip ieee80211_iter_keys() on scanning link remove
  wifi: mt76: mt7996: skip deflink accounting for offchannel links
  wifi: mt76: Move mt76_abort_scan out of mt76_reset_device()
  wifi: mt76: mt7996: move mt7996_update_beacons under mt76 mutex
  wifi: mt76: mt7996: grab mt76 mutex in mt7996_mac_sta_event()
  wifi: mt76: mt7925: ensure the 6GHz A-MPDU density cap from the hardware.
  wifi: mt76: mt7996: fix EMI rings for RRO
  wifi: mt76: mt7996: fix using wrong phy to start in mt7996_mac_restart()
  wifi: mt76: mt7996: fix MLO set key and group key issues
  wifi: mt76: mt7996: fix MLD group index assignment
  wifi: mt76: mt7996: use correct link_id when filling TXD and TXP
  ...
====================

Link: https://patch.msgid.link/20251127103806.17776-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 19:34:21 -08:00
Heiko Carstens c940be4c7c net: Remove KMSG_COMPONENT macro
The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message
catalog" from 2008 [1] which never made it upstream.

The macro was added to s390 code to allow for an out-of-tree patch which
used this to generate unique message ids. Also this out-of-tree patch
doesn't exist anymore.

The pattern of how the KMSG_COMPONENT macro is used can also be found at
some non s390 specific code, for whatever reasons. Besides adding an
indirection it is unused.

Remove the macro in order to get rid of a pointless indirection. Replace
all users with the string it defines. In all cases this leads to a simple
replacement like this:

 - #define KMSG_COMPONENT "af_iucv"
 - #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
 + #define pr_fmt(fmt) "af_iucv: " fmt

[1] https://lwn.net/Articles/292650/

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Sidraya Jayagond <sidraya@linux.ibm.com>
Link: https://patch.msgid.link/20251126140705.1944278-1-hca@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-28 19:20:27 -08:00
Christian Brauner 4667d63872
net/socket: convert __sys_accept4_file() to FD_ADD()
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-31-b6efa1706cfd@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28 12:42:34 +01:00
Christian Brauner 245f0d1c62
net/socket: convert sock_map_fd() to FD_ADD()
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-30-b6efa1706cfd@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28 12:42:34 +01:00
Christian Brauner 0d52d06a19
net/kcm: convert kcm_ioctl() to FD_PREPARE()
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-28-b6efa1706cfd@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28 12:42:34 +01:00
Christian Brauner fe67b063f6
net/handshake: convert handshake_nl_accept_doit() to FD_PREPARE()
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-27-b6efa1706cfd@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28 12:42:34 +01:00
Christian Brauner 7352c6fce3
af_unix: convert unix_file_open() to FD_ADD()
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-19-b6efa1706cfd@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28 12:42:33 +01:00
Jakub Kicinski 4c03592689 net: restore napi_consume_skb()'s NULL-handling
Commit e20dfbad8a ("net: fix napi_consume_skb() with alien skbs")
added a skb->cpu check to napi_consume_skb(), before the point where
napi_consume_skb() validated skb is not NULL.

Add an explicit check to the early exit condition.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-27 17:44:22 -08:00
Byungchul Park df59bb5b9a netmem, devmem, tcp: access pp fields through @desc in net_iov
Convert all the legacy code directly accessing the pp fields in net_iov
to access them through @desc in net_iov.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-27 17:41:51 -08:00
Fernando Fernandez Mancera c4cbe4a4df netfilter: nft_connlimit: add support to object update operation
This is useful to update the limit or flags without clearing the
connections tracked. Use READ_ONCE() on packetpath as it can be modified
on controlplane.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:06:43 +00:00
Fernando Fernandez Mancera 69894e5b4c netfilter: nft_connlimit: update the count if add was skipped
Connlimit expression can be used for all kind of packets and not only
for packets with connection state new. See this ruleset as example:

table ip filter {
        chain input {
                type filter hook input priority filter; policy accept;
                tcp dport 22 ct count over 4 counter
        }
}

Currently, if the connection count goes over the limit the counter will
count the packets. When a connection is closed, the connection count
won't decrement as it should because it is only updated for new
connections due to an optimization on __nf_conncount_add() that prevents
updating the list if the connection is duplicated.

To solve this problem, check whether the connection was skipped and if
so, update the list. Adjust count_tree() too so the same fix is applied
for xt_connlimit.

Fixes: 976afca1ce ("netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup")
Closes: https://lore.kernel.org/netfilter/trinity-85c72a88-d762-46c3-be97-36f10e5d9796-1761173693813@3c-app-mailcom-bs12/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:05:52 +00:00
Fernando Fernandez Mancera c0362b5748 netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH
For convenience when performing GC over the connection list, make
nf_conncount_gc_list() to disable BH. This unifies the behavior with
nf_conncount_add() and nf_conncount_count().

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:05:52 +00:00
Fernando Fernandez Mancera be102eb6a0 netfilter: nf_conncount: rework API to use sk_buff directly
When using nf_conncount infrastructure for non-confirmed connections a
duplicated track is possible due to an optimization introduced since
commit d265929930 ("netfilter: nf_conncount: reduce unnecessary GC").

In order to fix this introduce a new conncount API that receives
directly an sk_buff struct.  It fetches the tuple and zone and the
corresponding ct from it. It comes with both existing conncount variants
nf_conncount_count_skb() and nf_conncount_add_skb(). In addition remove
the old API and adjust all the users to use the new one.

This way, for each sk_buff struct it is possible to check if there is a
ct present and already confirmed. If so, skip the add operation.

Fixes: d265929930 ("netfilter: nf_conncount: reduce unnecessary GC")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:05:49 +00:00
Lorenzo Bianconi d30301ba4b netfilter: flowtable: Add IPIP tx sw acceleration
Introduce sw acceleration for tx path of IPIP tunnels relying on the
netfilter flowtable infrastructure.
This patch introduces basic infrastructure to accelerate other tunnel
types (e.g. IP6IP6).
IPIP sw tx acceleration can be tested running the following scenario where
the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP
tunnel is used to access a remote site (using eth1 as the underlay device):

ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)

$ip addr show
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.2/24 scope global eth0
       valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 scope global eth1
       valid_lft forever preferred_lft forever
8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 192.168.1.1 peer 192.168.1.2
    inet 192.168.100.1/24 scope global tun0
       valid_lft forever preferred_lft forever

$ip route show
default via 192.168.100.2 dev tun0
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1
192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1

$nft list ruleset
table inet filter {
        flowtable ft {
                hook ingress priority filter
                devices = { eth0, eth1 }
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
                meta l4proto { tcp, udp } flow add @ft
        }
}

Reproducing the scenario described above using veths I got the following
results:
- TCP stream trasmitted into the IPIP tunnel:
  - net-next: (baseline)                ~ 85Gbps
  - net-next + IPIP flowtable support:  ~102Gbps

Co-developed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:00:45 +00:00
Lorenzo Bianconi ab427db178 netfilter: flowtable: Add IPIP rx sw acceleration
Introduce sw acceleration for rx path of IPIP tunnels relying on the
netfilter flowtable infrastructure. Subsequent patches will add sw
acceleration for IPIP tunnels tx path.
This series introduces basic infrastructure to accelerate other tunnel
types (e.g. IP6IP6).
IPIP rx sw acceleration can be tested running the following scenario where
the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP
tunnel is used to access a remote site (using eth1 as the underlay device):

ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)

$ip addr show
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.2/24 scope global eth0
       valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 scope global eth1
       valid_lft forever preferred_lft forever
8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 192.168.1.1 peer 192.168.1.2
    inet 192.168.100.1/24 scope global tun0
       valid_lft forever preferred_lft forever

$ip route show
default via 192.168.100.2 dev tun0
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1
192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1

$nft list ruleset
table inet filter {
        flowtable ft {
                hook ingress priority filter
                devices = { eth0, eth1 }
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
                meta l4proto { tcp, udp } flow add @ft
        }
}

Reproducing the scenario described above using veths I got the following
results:
- TCP stream received from the IPIP tunnel:
  - net-next: (baseline)		~ 71Gbps
  - net-next + IPIP flowtbale support:	~101Gbps

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:00:38 +00:00
Pablo Neira Ayuso a0d98b641d netfilter: flowtable: use tuple address to calculate next hop
This simplifies IPIP tunnel support coming in follow up patches.

No function changes are intended.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:00:30 +00:00
Pablo Neira Ayuso 030feea309 netfilter: flowtable: remove hw_ifidx
hw_ifidx was originally introduced to store the real netdevice as a
requirement for the hardware offload support in:

 73f97025a9 ("netfilter: nft_flow_offload: use direct xmit if hardware offload is enabled")

Since ("netfilter: flowtable: consolidate xmit path"), ifidx and
hw_ifidx points to the real device in the xmit path, remove it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:00:22 +00:00
Pablo Neira Ayuso 18d27bed08 netfilter: flowtable: inline pppoe encapsulation in xmit path
Push the pppoe header from the flowtable xmit path, inlining is faster
than the original xmit path because it can avoid some locking.

This is based on a patch originally written by wenxu.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:00:14 +00:00
Pablo Neira Ayuso c653d5a78f netfilter: flowtable: inline vlan encapsulation in xmit path
Push the vlan header from the flowtable xmit path, instead of passing
the packet to the vlan device.

This is based on a patch originally written by wenxu.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-28 00:00:04 +00:00
Pablo Neira Ayuso b5964aac51 netfilter: flowtable: consolidate xmit path
Use dev_queue_xmit() for the XMIT_NEIGH case. Store the interface index
of the real device behind the vlan/pppoe device, this introduces  an
extra lookup for the real device in the xmit path because rt->dst.dev
provides the vlan/pppoe device.

XMIT_NEIGH now looks more similar to XMIT_DIRECT but the check for stale
dst and the neighbour lookup still remain in place which is convenient
to deal with network topology changes.

Note that nft_flow_route() needs to relax the check for _XMIT_NEIGH so
the existing basic xfrm offload (which only works in one direction) does
not break.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-27 23:59:56 +00:00
Pablo Neira Ayuso 93d7a7ed07 netfilter: flowtable: move path discovery infrastructure to its own file
This file contains the path discovery that is run from the forward chain
for the packet offloading the flow into the flowtable. This consists
of a series of calls to dev_fill_forward_path() for each device stack.

More topologies may be supported in the future, so move this code to its
own file to separate it from the nftables flow_offload expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-27 23:59:43 +00:00
Pablo Neira Ayuso 634f3853cc netfilter: flowtable: check for maximum number of encapsulations in bridge vlan
Add a sanity check to skip path discovery if the maximum number of
encapsulation is reached. While at it, check for underflow too.

Fixes: 26267bf9bb ("netfilter: flowtable: bridge vlan hardware offload and switchdev")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-11-27 23:51:31 +00:00
Jakub Kicinski db4029859d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

net/xdp/xsk.c
  0ebc27a4c6 ("xsk: avoid data corruption on cq descriptor number")
  8da7bea7db ("xsk: add indirect call for xsk_destruct_skb")
  30ed05adca ("xsk: use a smaller new lock for shared pool case")
https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au
https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-27 12:19:08 -08:00
Linus Torvalds e1afacb685 A patch to make sparse read handling work in msgr2 secure mode from
Slava and a couple of fixes from Ziming and myself to avoid operating
 on potentially invalid memory, all marked for stable.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmkomw8THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi7inB/4kGwlnraCx43Lw4UG4K/Z6RKypQ8zD
 WxwFvszq9jHJL7uUvXLx97sZApeK+AG6Kql/jnpEUUkFlG1h3013OwKJgRINrOdn
 5wJaWwi73akCWt3FDgcpc+gnxQ3s7gbD3cPbNeEhoHTjUrHk59ZgT37zbnUIrFK3
 NqnSBO7+7EPZtWwmC7mtki+lRLr3z47TqeqLqLQhw6GoenRo/9ZrUynvneCWKXfT
 bZRZ5MDJGg/ZMR1agPmUojgHkkZAD5xc21fK5mmYeMr9JbNB4Vw9BO48XW+WCA3o
 TH5en6HZhJuoxkp21mG7BZjnkAVTPCpo3pMt2IqlAIWCkODtXlThWGQN
 =cZYx
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.18-rc8' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A patch to make sparse read handling work in msgr2 secure mode from
  Slava and a couple of fixes from Ziming and myself to avoid operating
  on potentially invalid memory, all marked for stable"

* tag 'ceph-for-6.18-rc8' of https://github.com/ceph/ceph-client:
  libceph: prevent potential out-of-bounds writes in handle_auth_session_key()
  libceph: replace BUG_ON with bounds check for map->max_osd
  ceph: fix crash in process_v2_sparse_read() for encrypted directories
  libceph: drop started parameter of __ceph_open_session()
  libceph: fix potential use-after-free in have_mon_and_osd_map()
2025-11-27 11:11:03 -08:00
Paolo Abeni 73f784b2c9 linux-can-next-for-6.19-20251126
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAmkm2EcTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAMdGXf+ZCRnCT7B/98qQUzIef9I3Z0jsQhPmNHzyf/lO2d
 dym2qE3axbLfiUziC8FA997iDellCR/TqOHD1iVsfw2ifaiBwKx63ZOv3AY1ob0+
 C0lM6qTBA8HNn1M9Ij4x96v+qz3dE5n0eyvKvDdEmiUL6gz9T+QItjTdph4WIyWL
 bpnPPUs75KLDtlTkztTJ807fStnCJMn6UP+tXcqjLE+1lZ9k+hpLH4jzprn4oVry
 vLU5TmbXkDZVL9MdpecExHE5mTV1f1Fch7TtliUoBY9CFZ58CSXkuSzu38F3j/xD
 nytAydXNy6wjDU5GkIca2g0MPWmBI3EF5QHvfnWBjxScQC6MdUxP5wrJ
 =F3+Y
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-6.19-20251126' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2025-11-26

this is a pull request of 27 patches for net-next/main.

The first 17 patches are by Vincent Mailhol and Oliver Hartkopp and
add CAN XL support to the CAN netlink interface.

Geert Uytterhoeven and Biju Das provide 7 patches for the rcar_canfd
driver to add suspend/resume support.

The next 2 patches are by Markus Schneider-Pargmann and add them as
the m_can maintainer.

Conor Dooley's patch updates the mpfs-can DT bindungs.

linux-can-next-for-6.19-20251126

* tag 'linux-can-next-for-6.19-20251126' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (27 commits)
  dt-bindings: can: mpfs: document resets
  MAINTAINERS: Simplify m_can section
  MAINTAINERS: Add myself as m_can maintainer
  can: rcar_canfd: Add suspend/resume support
  can: rcar_canfd: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
  can: rcar_canfd: Invert CAN clock and close_candev() order
  can: rcar_canfd: Extract rcar_canfd_global_{,de}init()
  can: rcar_canfd: Use devm_clk_get_optional() for RAM clk
  can: rcar_canfd: Invert global vs. channel teardown
  can: rcar_canfd: Invert reset assert order
  can: dev: print bitrate error with two decimal digits
  can: raw: instantly reject unsupported CAN frames
  can: add dummy_can driver
  can: calc_bittiming: add can_calc_sample_point_pwm()
  can: calc_bittiming: add can_calc_sample_point_nrz()
  can: calc_bittiming: replace misleading "nominal" by "reference"
  can: netlink: add PWM netlink interface
  can: calc_bittiming: add PWM calculation
  can: bittiming: add PWM validation
  can: bittiming: add PWM parameters
  ...
====================

Link: https://patch.msgid.link/20251126120106.154635-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27 15:45:17 +01:00
Kuniyuki Iwashima f07f4ea53e mptcp: Initialise rcv_mss before calling tcp_send_active_reset() in mptcp_do_fastclose().
syzbot reported divide-by-zero in __tcp_select_window() by
MPTCP socket. [0]

We had a similar issue for the bare TCP and fixed in commit
499350a5a6 ("tcp: initialize rcv_mss to TCP_MIN_MSS instead
of 0").

Let's apply the same fix to mptcp_do_fastclose().

[0]:
Oops: divide error: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6068 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
RIP: 0010:__tcp_select_window+0x824/0x1320 net/ipv4/tcp_output.c:3336
Code: ff ff ff 44 89 f1 d3 e0 89 c1 f7 d1 41 01 cc 41 21 c4 e9 a9 00 00 00 e8 ca 49 01 f8 e9 9c 00 00 00 e8 c0 49 01 f8 44 89 e0 99 <f7> 7c 24 1c 41 29 d4 48 bb 00 00 00 00 00 fc ff df e9 80 00 00 00
RSP: 0018:ffffc90003017640 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88807b469e40
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003017730 R08: ffff888033268143 R09: 1ffff1100664d028
R10: dffffc0000000000 R11: ffffed100664d029 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  000055557faa0500(0000) GS:ffff888126135000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f64a1912ff8 CR3: 0000000072122000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 tcp_select_window net/ipv4/tcp_output.c:281 [inline]
 __tcp_transmit_skb+0xbc7/0x3aa0 net/ipv4/tcp_output.c:1568
 tcp_transmit_skb net/ipv4/tcp_output.c:1649 [inline]
 tcp_send_active_reset+0x2d1/0x5b0 net/ipv4/tcp_output.c:3836
 mptcp_do_fastclose+0x27e/0x380 net/mptcp/protocol.c:2793
 mptcp_disconnect+0x238/0x710 net/mptcp/protocol.c:3253
 mptcp_sendmsg_fastopen+0x2f8/0x580 net/mptcp/protocol.c:1776
 mptcp_sendmsg+0x1774/0x1980 net/mptcp/protocol.c:1855
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg+0xe5/0x270 net/socket.c:742
 __sys_sendto+0x3bd/0x520 net/socket.c:2244
 __do_sys_sendto net/socket.c:2251 [inline]
 __se_sys_sendto net/socket.c:2247 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2247
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f66e998f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffff9acedb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f66e9be5fa0 RCX: 00007f66e998f749
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007ffff9acee10 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007f66e9be5fa0 R14: 00007f66e9be5fa0 R15: 0000000000000006
 </TASK>

Fixes: ae15506024 ("mptcp: fix duplicate reset on fastclose")
Reported-by: syzbot+3a92d359bc2ec6255a33@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69260882.a70a0220.d98e3.00b4.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251125195331.309558-1-kuniyu@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27 13:10:16 +01:00
Jeremy Kerr b3e528a581 net: mctp: unconditionally set skb->dev on dst output
On transmit, we are currently relying on skb->dev being set by
mctp_local_output() when we first set up the skb destination fields.
However, forwarded skbs do not use the local_output path, so will retain
their incoming netdev as their ->dev on tx. This does not work when
we're forwarding between interfaces.

Set skb->dev unconditionally in the transmit path, to allow for proper
forwarding.

We keep the skb->dev initialisation in mctp_local_output(), as we use it
for fragmentation.

Fixes: 269936db5e ("net: mctp: separate routing database from routing operations")
Suggested-by: Vince Chang <vince_chang@aspeedtech.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20251125-dev-forward-v1-1-54ecffcd0616@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27 11:39:12 +01:00
ziming zhang 7fce830ecd libceph: prevent potential out-of-bounds writes in handle_auth_session_key()
The len field originates from untrusted network packets. Boundary
checks have been added to prevent potential out-of-bounds writes when
decrypting the connection secret or processing service tickets.

[ idryomov: changelog ]

Cc: stable@vger.kernel.org
Signed-off-by: ziming zhang <ezrakiez@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-11-27 09:59:49 +01:00
ziming zhang ec3797f043 libceph: replace BUG_ON with bounds check for map->max_osd
OSD indexes come from untrusted network packets. Boundary checks are
added to validate these against map->max_osd.

[ idryomov: drop BUG_ON in ceph_get_primary_affinity(), minor cosmetic
  edits ]

Cc: stable@vger.kernel.org
Signed-off-by: ziming zhang <ezrakiez@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-11-27 09:59:42 +01:00
Viacheslav Dubeyko 43962db4a6 ceph: fix crash in process_v2_sparse_read() for encrypted directories
The crash in process_v2_sparse_read() for fscrypt-encrypted directories
has been reported. Issue takes place for Ceph msgr2 protocol in secure
mode. It can be reproduced by the steps:

sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs,ms_mode=secure

(1) mkdir /mnt/cephfs/fscrypt-test-3
(2) cp area_decrypted.tar /mnt/cephfs/fscrypt-test-3
(3) fscrypt encrypt --source=raw_key --key=./my.key /mnt/cephfs/fscrypt-test-3
(4) fscrypt lock /mnt/cephfs/fscrypt-test-3
(5) fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test-3
(6) cat /mnt/cephfs/fscrypt-test-3/area_decrypted.tar
(7) Issue has been triggered

[  408.072247] ------------[ cut here ]------------
[  408.072251] WARNING: CPU: 1 PID: 392 at net/ceph/messenger_v2.c:865
ceph_con_v2_try_read+0x4b39/0x72f0
[  408.072267] Modules linked in: intel_rapl_msr intel_rapl_common
intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery
pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass
polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse
serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg
pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
[  408.072304] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Not tainted 6.17.0-rc7+
[  408.072307] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.17.0-5.fc42 04/01/2014
[  408.072310] Workqueue: ceph-msgr ceph_con_workfn
[  408.072314] RIP: 0010:ceph_con_v2_try_read+0x4b39/0x72f0
[  408.072317] Code: c7 c1 20 f0 d4 ae 50 31 d2 48 c7 c6 60 27 d5 ae 48 c7 c7 f8
8e 6f b0 68 60 38 d5 ae e8 00 47 61 fe 48 83 c4 18 e9 ac fc ff ff <0f> 0b e9 06
fe ff ff 4c 8b 9d 98 fd ff ff 0f 84 64 e7 ff ff 89 85
[  408.072319] RSP: 0018:ffff88811c3e7a30 EFLAGS: 00010246
[  408.072322] RAX: ffffed1024874c6f RBX: ffffea00042c2b40 RCX: 0000000000000f38
[  408.072324] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  408.072325] RBP: ffff88811c3e7ca8 R08: 0000000000000000 R09: 00000000000000c8
[  408.072326] R10: 00000000000000c8 R11: 0000000000000000 R12: 00000000000000c8
[  408.072327] R13: dffffc0000000000 R14: ffff8881243a6030 R15: 0000000000003000
[  408.072329] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000)
knlGS:0000000000000000
[  408.072331] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.072332] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.072336] PKRU: 55555554
[  408.072337] Call Trace:
[  408.072338]  <TASK>
[  408.072340]  ? sched_clock_noinstr+0x9/0x10
[  408.072344]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
[  408.072347]  ? _raw_spin_unlock+0xe/0x40
[  408.072349]  ? finish_task_switch.isra.0+0x15d/0x830
[  408.072353]  ? __kasan_check_write+0x14/0x30
[  408.072357]  ? mutex_lock+0x84/0xe0
[  408.072359]  ? __pfx_mutex_lock+0x10/0x10
[  408.072361]  ceph_con_workfn+0x27e/0x10e0
[  408.072364]  ? metric_delayed_work+0x311/0x2c50
[  408.072367]  process_one_work+0x611/0xe20
[  408.072371]  ? __kasan_check_write+0x14/0x30
[  408.072373]  worker_thread+0x7e3/0x1580
[  408.072375]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  408.072378]  ? __pfx_worker_thread+0x10/0x10
[  408.072381]  kthread+0x381/0x7a0
[  408.072383]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  408.072385]  ? __pfx_kthread+0x10/0x10
[  408.072387]  ? __kasan_check_write+0x14/0x30
[  408.072389]  ? recalc_sigpending+0x160/0x220
[  408.072392]  ? _raw_spin_unlock_irq+0xe/0x50
[  408.072394]  ? calculate_sigpending+0x78/0xb0
[  408.072395]  ? __pfx_kthread+0x10/0x10
[  408.072397]  ret_from_fork+0x2b6/0x380
[  408.072400]  ? __pfx_kthread+0x10/0x10
[  408.072402]  ret_from_fork_asm+0x1a/0x30
[  408.072406]  </TASK>
[  408.072407] ---[ end trace 0000000000000000 ]---
[  408.072418] Oops: general protection fault, probably for non-canonical
address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
[  408.072984] KASAN: null-ptr-deref in range [0x0000000000000000-
0x0000000000000007]
[  408.073350] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Tainted: G        W
6.17.0-rc7+ #1 PREEMPT(voluntary)
[  408.073886] Tainted: [W]=WARN
[  408.074042] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.17.0-5.fc42 04/01/2014
[  408.074468] Workqueue: ceph-msgr ceph_con_workfn
[  408.074694] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80
[  408.074976] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00
00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16
84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03
[  408.075884] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246
[  408.076305] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000
[  408.076909] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378
[  408.077466] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8
[  408.078034] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71
[  408.078575] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378
[  408.079159] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000)
knlGS:0000000000000000
[  408.079736] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.080039] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.080376] PKRU: 55555554
[  408.080513] Call Trace:
[  408.080630]  <TASK>
[  408.080729]  ceph_con_v2_try_read+0x49b9/0x72f0
[  408.081115]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
[  408.081348]  ? _raw_spin_unlock+0xe/0x40
[  408.081538]  ? finish_task_switch.isra.0+0x15d/0x830
[  408.081768]  ? __kasan_check_write+0x14/0x30
[  408.081986]  ? mutex_lock+0x84/0xe0
[  408.082160]  ? __pfx_mutex_lock+0x10/0x10
[  408.082343]  ceph_con_workfn+0x27e/0x10e0
[  408.082529]  ? metric_delayed_work+0x311/0x2c50
[  408.082737]  process_one_work+0x611/0xe20
[  408.082948]  ? __kasan_check_write+0x14/0x30
[  408.083156]  worker_thread+0x7e3/0x1580
[  408.083331]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  408.083557]  ? __pfx_worker_thread+0x10/0x10
[  408.083751]  kthread+0x381/0x7a0
[  408.083922]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  408.084139]  ? __pfx_kthread+0x10/0x10
[  408.084310]  ? __kasan_check_write+0x14/0x30
[  408.084510]  ? recalc_sigpending+0x160/0x220
[  408.084708]  ? _raw_spin_unlock_irq+0xe/0x50
[  408.084917]  ? calculate_sigpending+0x78/0xb0
[  408.085138]  ? __pfx_kthread+0x10/0x10
[  408.085335]  ret_from_fork+0x2b6/0x380
[  408.085525]  ? __pfx_kthread+0x10/0x10
[  408.085720]  ret_from_fork_asm+0x1a/0x30
[  408.085922]  </TASK>
[  408.086036] Modules linked in: intel_rapl_msr intel_rapl_common
intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery
pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass
polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse
serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg
pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
[  408.087778] ---[ end trace 0000000000000000 ]---
[  408.088007] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80
[  408.088260] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00
00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16
84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03
[  408.089118] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246
[  408.089357] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000
[  408.089678] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378
[  408.090020] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8
[  408.090360] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71
[  408.090687] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378
[  408.091035] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000)
knlGS:0000000000000000
[  408.091452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.092015] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.092530] PKRU: 55555554
[  417.112915]
==================================================================
[  417.113491] BUG: KASAN: slab-use-after-free in
__mutex_lock.constprop.0+0x1522/0x1610
[  417.114014] Read of size 4 at addr ffff888124870034 by task kworker/2:0/4951

[  417.114587] CPU: 2 UID: 0 PID: 4951 Comm: kworker/2:0 Tainted: G      D W
6.17.0-rc7+ #1 PREEMPT(voluntary)
[  417.114592] Tainted: [D]=DIE, [W]=WARN
[  417.114593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.17.0-5.fc42 04/01/2014
[  417.114596] Workqueue: events handle_timeout
[  417.114601] Call Trace:
[  417.114602]  <TASK>
[  417.114604]  dump_stack_lvl+0x5c/0x90
[  417.114610]  print_report+0x171/0x4dc
[  417.114613]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  417.114617]  ? kasan_complete_mode_report_info+0x80/0x220
[  417.114621]  kasan_report+0xbd/0x100
[  417.114625]  ? __mutex_lock.constprop.0+0x1522/0x1610
[  417.114628]  ? __mutex_lock.constprop.0+0x1522/0x1610
[  417.114630]  __asan_report_load4_noabort+0x14/0x30
[  417.114633]  __mutex_lock.constprop.0+0x1522/0x1610
[  417.114635]  ? queue_con_delay+0x8d/0x200
[  417.114638]  ? __pfx___mutex_lock.constprop.0+0x10/0x10
[  417.114641]  ? __send_subscribe+0x529/0xb20
[  417.114644]  __mutex_lock_slowpath+0x13/0x20
[  417.114646]  mutex_lock+0xd4/0xe0
[  417.114649]  ? __pfx_mutex_lock+0x10/0x10
[  417.114652]  ? ceph_monc_renew_subs+0x2a/0x40
[  417.114654]  ceph_con_keepalive+0x22/0x110
[  417.114656]  handle_timeout+0x6b3/0x11d0
[  417.114659]  ? _raw_spin_unlock_irq+0xe/0x50
[  417.114662]  ? __pfx_handle_timeout+0x10/0x10
[  417.114664]  ? queue_delayed_work_on+0x8e/0xa0
[  417.114669]  process_one_work+0x611/0xe20
[  417.114672]  ? __kasan_check_write+0x14/0x30
[  417.114676]  worker_thread+0x7e3/0x1580
[  417.114678]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  417.114682]  ? __pfx_sched_setscheduler_nocheck+0x10/0x10
[  417.114687]  ? __pfx_worker_thread+0x10/0x10
[  417.114689]  kthread+0x381/0x7a0
[  417.114692]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  417.114694]  ? __pfx_kthread+0x10/0x10
[  417.114697]  ? __kasan_check_write+0x14/0x30
[  417.114699]  ? recalc_sigpending+0x160/0x220
[  417.114703]  ? _raw_spin_unlock_irq+0xe/0x50
[  417.114705]  ? calculate_sigpending+0x78/0xb0
[  417.114707]  ? __pfx_kthread+0x10/0x10
[  417.114710]  ret_from_fork+0x2b6/0x380
[  417.114713]  ? __pfx_kthread+0x10/0x10
[  417.114715]  ret_from_fork_asm+0x1a/0x30
[  417.114720]  </TASK>

[  417.125171] Allocated by task 2:
[  417.125333]  kasan_save_stack+0x26/0x60
[  417.125522]  kasan_save_track+0x14/0x40
[  417.125742]  kasan_save_alloc_info+0x39/0x60
[  417.125945]  __kasan_slab_alloc+0x8b/0xb0
[  417.126133]  kmem_cache_alloc_node_noprof+0x13b/0x460
[  417.126381]  copy_process+0x320/0x6250
[  417.126595]  kernel_clone+0xb7/0x840
[  417.126792]  kernel_thread+0xd6/0x120
[  417.126995]  kthreadd+0x85c/0xbe0
[  417.127176]  ret_from_fork+0x2b6/0x380
[  417.127378]  ret_from_fork_asm+0x1a/0x30

[  417.127692] Freed by task 0:
[  417.127851]  kasan_save_stack+0x26/0x60
[  417.128057]  kasan_save_track+0x14/0x40
[  417.128267]  kasan_save_free_info+0x3b/0x60
[  417.128491]  __kasan_slab_free+0x6c/0xa0
[  417.128708]  kmem_cache_free+0x182/0x550
[  417.128906]  free_task+0xeb/0x140
[  417.129070]  __put_task_struct+0x1d2/0x4f0
[  417.129259]  __put_task_struct_rcu_cb+0x15/0x20
[  417.129480]  rcu_do_batch+0x3d3/0xe70
[  417.129681]  rcu_core+0x549/0xb30
[  417.129839]  rcu_core_si+0xe/0x20
[  417.130005]  handle_softirqs+0x160/0x570
[  417.130190]  __irq_exit_rcu+0x189/0x1e0
[  417.130369]  irq_exit_rcu+0xe/0x20
[  417.130531]  sysvec_apic_timer_interrupt+0x9f/0xd0
[  417.130768]  asm_sysvec_apic_timer_interrupt+0x1b/0x20

[  417.131082] Last potentially related work creation:
[  417.131305]  kasan_save_stack+0x26/0x60
[  417.131484]  kasan_record_aux_stack+0xae/0xd0
[  417.131695]  __call_rcu_common+0xcd/0x14b0
[  417.131909]  call_rcu+0x31/0x50
[  417.132071]  delayed_put_task_struct+0x128/0x190
[  417.132295]  rcu_do_batch+0x3d3/0xe70
[  417.132478]  rcu_core+0x549/0xb30
[  417.132658]  rcu_core_si+0xe/0x20
[  417.132808]  handle_softirqs+0x160/0x570
[  417.132993]  __irq_exit_rcu+0x189/0x1e0
[  417.133181]  irq_exit_rcu+0xe/0x20
[  417.133353]  sysvec_apic_timer_interrupt+0x9f/0xd0
[  417.133584]  asm_sysvec_apic_timer_interrupt+0x1b/0x20

[  417.133921] Second to last potentially related work creation:
[  417.134183]  kasan_save_stack+0x26/0x60
[  417.134362]  kasan_record_aux_stack+0xae/0xd0
[  417.134566]  __call_rcu_common+0xcd/0x14b0
[  417.134782]  call_rcu+0x31/0x50
[  417.134929]  put_task_struct_rcu_user+0x58/0xb0
[  417.135143]  finish_task_switch.isra.0+0x5d3/0x830
[  417.135366]  __schedule+0xd30/0x5100
[  417.135534]  schedule_idle+0x5a/0x90
[  417.135712]  do_idle+0x25f/0x410
[  417.135871]  cpu_startup_entry+0x53/0x70
[  417.136053]  start_secondary+0x216/0x2c0
[  417.136233]  common_startup_64+0x13e/0x141

[  417.136894] The buggy address belongs to the object at ffff888124870000
                which belongs to the cache task_struct of size 10504
[  417.138122] The buggy address is located 52 bytes inside of
                freed 10504-byte region [ffff888124870000, ffff888124872908)

[  417.139465] The buggy address belongs to the physical page:
[  417.140016] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
pfn:0x124870
[  417.140789] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0
pincount:0
[  417.141519] memcg:ffff88811aa20e01
[  417.141874] anon flags:
0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[  417.142600] page_type: f5(slab)
[  417.142922] raw: 0017ffffc0000040 ffff88810094f040 0000000000000000
dead000000000001
[  417.143554] raw: 0000000000000000 0000000000030003 00000000f5000000
ffff88811aa20e01
[  417.143954] head: 0017ffffc0000040 ffff88810094f040 0000000000000000
dead000000000001
[  417.144329] head: 0000000000000000 0000000000030003 00000000f5000000
ffff88811aa20e01
[  417.144710] head: 0017ffffc0000003 ffffea0004921c01 00000000ffffffff
00000000ffffffff
[  417.145106] head: ffffffffffffffff 0000000000000000 00000000ffffffff
0000000000000008
[  417.145485] page dumped because: kasan: bad access detected

[  417.145859] Memory state around the buggy address:
[  417.146094]  ffff88812486ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
fc
[  417.146439]  ffff88812486ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
fc
[  417.146791] >ffff888124870000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[  417.147145]                                      ^
[  417.147387]  ffff888124870080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[  417.147751]  ffff888124870100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
fb
[  417.148123]
==================================================================

First of all, we have warning in get_bvec_at() because
cursor->total_resid contains zero value. And, finally,
we have crash in ceph_msg_data_advance() because
cursor->data is NULL. It means that get_bvec_at()
receives not initialized ceph_msg_data_cursor structure
because data is NULL and total_resid contains zero.

Moreover, we don't have likewise issue for the case of
Ceph msgr1 protocol because ceph_msg_data_cursor_init()
has been called before reading sparse data.

This patch adds calling of ceph_msg_data_cursor_init()
in the beginning of process_v2_sparse_read() with
the goal to guarantee that logic of reading sparse data
works correctly for the case of Ceph msgr2 protocol.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/73152
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-11-27 09:59:34 +01:00
Jakub Kicinski 8ec205e879 linux-can-fixes-for-6.18-20251126
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAmknHroTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAMdGXf+ZCRnENfCACmEL7NELoRi3RvXzlYOjDi0aEW2C79
 k9hPBkzBor/W9JhBOf7+3nbDiL8nJdMQIs3FY6X/OSgyxEvGwgXb+PiFVBaHtDmx
 PvJb7PjwNjpm4GnoPlQVUGBZxH4fDWz7JS34M1+BbY8k1ajN6ndHURHE6rtGMFIM
 obClghS9i4K/Uww/bpSzj5cU4AmIuLIU8X6Oqq2X1PacgtGYP8WNh/eQ1xk03Nt8
 RxrU0XTkPTsbS74Y4h+1bwNhxMwjLA97f2GgZ6GUa22lrzDnBpfqpBWSy0xIDJTe
 GZgKpBOhbFSRjPH8f89cRXz47jvMfReC05ccuKC82+RK94Yge8F1YIxP
 =HPfp
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-6.18-20251126' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2025-11-26

this is a pull request of 8 patches for net/main.

Seungjin Bae provides a patch for the kvaser_usb driver to fix a
potential infinite loop in the USB data stream command parser.

Thomas Mühlbacher's patch for the sja1000 driver IRQ handler's max
loop handling, that might lead to unhandled interrupts.

3 patches by me for the gs_usb driver fix handling of failed transmit
URBs and add checking of the actual length of received URBs before
accessing the data.

The next patch is by me and is a port of Thomas Mühlbacher's patch
(fix IRQ handler's max loop handling, that might lead to unhandled
interrupts.) to the sun4i_can driver.

Biju Das provides a patch for the rcar_canfd driver to fix the CAN-FD
mode setting.

The last patch is by Shaurya Rane for the em_canid filter to ensure
that the complete CAN frame is present in the linear data buffer
before accessing it.

* tag 'linux-can-fixes-for-6.18-20251126' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  net/sched: em_canid: fix uninit-value in em_canid_match
  can: rcar_canfd: Fix CAN-FD mode as default
  can: sun4i_can: sun4i_can_interrupt(): fix max irq loop handling
  can: gs_usb: gs_usb_receive_bulk_callback(): check actual_length before accessing data
  can: gs_usb: gs_usb_receive_bulk_callback(): check actual_length before accessing header
  can: gs_usb: gs_usb_xmit_callback(): fix handling of failed transmitted URBs
  can: sja1000: fix max irq loop handling
  can: kvaser_usb: leaf: Fix potential infinite loop in command parsers
====================

Link: https://patch.msgid.link/20251126155713.217105-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-26 19:56:00 -08:00
Paolo Abeni 27fd028601 mptcp: clear scheduled subflows on retransmit
When __mptcp_retrans() kicks-in, it schedules one or more subflows for
retransmission, but such subflows could be actually left alone if there
is no more data to retransmit and/or in case of concurrent fallback.

Scheduled subflows could be processed much later in time, i.e. when new
data will be transmitted, leading to bad subflow selection.

Explicitly clear all scheduled subflows before leaving the
retransmission function.

Fixes: ee2708aeda ("mptcp: use get_retrans wrapper")
Cc: stable@vger.kernel.org
Reported-by: Filip Pokryvka <fpokryvk@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251125-net-mptcp-clear-sched-rtx-v1-1-1cea4ad2165f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-26 18:24:35 -08:00
Vadim Fedorenko f467777efb phy: add hwtstamp_get callback to phy drivers
PHY devices had lack of hwtstamp_get callback even though most of them
are tracking configuration info. Introduce new call back to
mii_timestamper.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251124181151.277256-3-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-26 16:56:33 -08:00
Ilya Dryomov 85f5491d9c libceph: drop started parameter of __ceph_open_session()
With the previous commit revamping the timeout handling, started isn't
used anymore.  It could be taken into account by adjusting the initial
value of the timeout, but there is little point as both callers capture
the timestamp shortly before calling __ceph_open_session() -- the only
thing of note that happens in the interim is taking client->mount_mutex
and that isn't expected to take multiple seconds.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
2025-11-26 23:29:11 +01:00
Ilya Dryomov 076381c261 libceph: fix potential use-after-free in have_mon_and_osd_map()
The wait loop in __ceph_open_session() can race with the client
receiving a new monmap or osdmap shortly after the initial map is
received.  Both ceph_monc_handle_map() and handle_one_map() install
a new map immediately after freeing the old one

    kfree(monc->monmap);
    monc->monmap = monmap;

    ceph_osdmap_destroy(osdc->osdmap);
    osdc->osdmap = newmap;

under client->monc.mutex and client->osdc.lock respectively, but
because neither is taken in have_mon_and_osd_map() it's possible for
client->monc.monmap->epoch and client->osdc.osdmap->epoch arms in

    client->monc.monmap && client->monc.monmap->epoch &&
        client->osdc.osdmap && client->osdc.osdmap->epoch;

condition to dereference an already freed map.  This happens to be
reproducible with generic/395 and generic/397 with KASAN enabled:

    BUG: KASAN: slab-use-after-free in have_mon_and_osd_map+0x56/0x70
    Read of size 4 at addr ffff88811012d810 by task mount.ceph/13305
    CPU: 2 UID: 0 PID: 13305 Comm: mount.ceph Not tainted 6.14.0-rc2-build2+ #1266
    ...
    Call Trace:
    <TASK>
    have_mon_and_osd_map+0x56/0x70
    ceph_open_session+0x182/0x290
    ceph_get_tree+0x333/0x680
    vfs_get_tree+0x49/0x180
    do_new_mount+0x1a3/0x2d0
    path_mount+0x6dd/0x730
    do_mount+0x99/0xe0
    __do_sys_mount+0x141/0x180
    do_syscall_64+0x9f/0x100
    entry_SYSCALL_64_after_hwframe+0x76/0x7e
    </TASK>

    Allocated by task 13305:
    ceph_osdmap_alloc+0x16/0x130
    ceph_osdc_init+0x27a/0x4c0
    ceph_create_client+0x153/0x190
    create_fs_client+0x50/0x2a0
    ceph_get_tree+0xff/0x680
    vfs_get_tree+0x49/0x180
    do_new_mount+0x1a3/0x2d0
    path_mount+0x6dd/0x730
    do_mount+0x99/0xe0
    __do_sys_mount+0x141/0x180
    do_syscall_64+0x9f/0x100
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

    Freed by task 9475:
    kfree+0x212/0x290
    handle_one_map+0x23c/0x3b0
    ceph_osdc_handle_map+0x3c9/0x590
    mon_dispatch+0x655/0x6f0
    ceph_con_process_message+0xc3/0xe0
    ceph_con_v1_try_read+0x614/0x760
    ceph_con_workfn+0x2de/0x650
    process_one_work+0x486/0x7c0
    process_scheduled_works+0x73/0x90
    worker_thread+0x1c8/0x2a0
    kthread+0x2ec/0x300
    ret_from_fork+0x24/0x40
    ret_from_fork_asm+0x1a/0x30

Rewrite the wait loop to check the above condition directly with
client->monc.mutex and client->osdc.lock taken as appropriate.  While
at it, improve the timeout handling (previously mount_timeout could be
exceeded in case wait_event_interruptible_timeout() slept more than
once) and access client->auth_err under client->monc.mutex to match
how it's set in finish_auth().

monmap_show() and osdmap_show() now take the respective lock before
accessing the map as well.

Cc: stable@vger.kernel.org
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
2025-11-26 23:29:10 +01:00
Gabriel Krisman Bertazi d73c167708 socket: Split out a getsockname helper for io_uring
Similar to getsockopt, split out a helper to check security and issue
the operation from the main handler that can be used by io_uring.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-26 13:45:23 -07:00
Gabriel Krisman Bertazi 4677e78800 socket: Unify getsockname and getpeername implementation
They are already implemented by the same get_name hook in the protocol
level.  Bring the unification one level up to reduce code duplication
in preparation to supporting these as io_uring operations.

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-26 13:45:23 -07:00
Shaurya Rane 0c922106d7 net/sched: em_canid: fix uninit-value in em_canid_match
Use pskb_may_pull() to ensure a complete CAN frame is present in the
linear data buffer before reading the CAN ID. A simple skb->len check
is insufficient because it only verifies the total data length but does
not guarantee the data is present in skb->data (it could be in
fragments).

pskb_may_pull() both validates the length and pulls fragmented data
into the linear buffer if necessary, making it safe to directly
access skb->data.

Reported-by: syzbot+5d8269a1e099279152bc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5d8269a1e099279152bc
Fixes: f057bbb6f9 ("net: em_canid: Ematch rule to match CAN frames according to their identifiers")
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
Link: https://patch.msgid.link/20251126085718.50808-1-ssranevjti@gmail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26 16:28:10 +01:00
Oliver Hartkopp 1a620a7238 can: raw: instantly reject unsupported CAN frames
For real CAN interfaces the CAN_CTRLMODE_FD and CAN_CTRLMODE_XL control
modes indicate whether an interface can handle those CAN FD/XL frames.

In the case a CAN XL interface is configured in CANXL-only mode with
disabled error-signalling neither CAN CC nor CAN FD frames can be sent.

The checks are performed on CAN_RAW sockets to give an instant feedback
to the user when writing unsupported CAN frames to the interface.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20251126-canxl-v8-16-e7e3eb74f889@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26 11:20:44 +01:00
Lachlan Hodges f9e788c5fd wifi: mac80211: allow sharing identical chanctx for S1G interfaces
Introduce support for sharing identical channel contexts for S1G
interfaces. Additionally, do not downgrade channel requests for
S1G interfaces.

Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com>
Link: https://patch.msgid.link/20251126015758.149034-1-lachlan.hodges@morsemicro.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-26 10:34:51 +01:00
Fernando Fernandez Mancera 0ebc27a4c6 xsk: avoid data corruption on cq descriptor number
Since commit 30f241fcf5 ("xsk: Fix immature cq descriptor
production"), the descriptor number is stored in skb control block and
xsk_cq_submit_addr_locked() relies on it to put the umem addrs onto
pool's completion queue.

skb control block shouldn't be used for this purpose as after transmit
xsk doesn't have control over it and other subsystems could use it. This
leads to the following kernel panic due to a NULL pointer dereference.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 2 UID: 1 PID: 927 Comm: p4xsk.bin Not tainted 6.16.12+deb14-cloud-amd64 #1 PREEMPT(lazy)  Debian 6.16.12-1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
 RIP: 0010:xsk_destruct_skb+0xd0/0x180
 [...]
 Call Trace:
  <IRQ>
  ? napi_complete_done+0x7a/0x1a0
  ip_rcv_core+0x1bb/0x340
  ip_rcv+0x30/0x1f0
  __netif_receive_skb_one_core+0x85/0xa0
  process_backlog+0x87/0x130
  __napi_poll+0x28/0x180
  net_rx_action+0x339/0x420
  handle_softirqs+0xdc/0x320
  ? handle_edge_irq+0x90/0x1e0
  do_softirq.part.0+0x3b/0x60
  </IRQ>
  <TASK>
  __local_bh_enable_ip+0x60/0x70
  __dev_direct_xmit+0x14e/0x1f0
  __xsk_generic_xmit+0x482/0xb70
  ? __remove_hrtimer+0x41/0xa0
  ? __xsk_generic_xmit+0x51/0xb70
  ? _raw_spin_unlock_irqrestore+0xe/0x40
  xsk_sendmsg+0xda/0x1c0
  __sys_sendto+0x1ee/0x200
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x84/0x2f0
  ? __pfx_pollwake+0x10/0x10
  ? __rseq_handle_notify_resume+0xad/0x4c0
  ? restore_fpregs_from_fpstate+0x3c/0x90
  ? switch_fpu_return+0x5b/0xe0
  ? do_syscall_64+0x204/0x2f0
  ? do_syscall_64+0x204/0x2f0
  ? do_syscall_64+0x204/0x2f0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  </TASK>
 [...]
 Kernel panic - not syncing: Fatal exception in interrupt
 Kernel Offset: 0x1c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Instead use the skb destructor_arg pointer along with pointer tagging.
As pointers are always aligned to 8B, use the bottom bit to indicate
whether this a single address or an allocated struct containing several
addresses.

Fixes: 30f241fcf5 ("xsk: Fix immature cq descriptor production")
Closes: https://lore.kernel.org/netdev/0435b904-f44f-48f8-afb0-68868474bf1c@nop.hu/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20251124171409.3845-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:51:50 -08:00
Eric Dumazet 9a5e5334ad tcp: remove icsk->icsk_retransmit_timer
Now sk->sk_timer is no longer used by TCP keepalive, we can use
its storage for TCP and MPTCP retransmit timers for better
cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:28:29 -08:00
Eric Dumazet 08dfe37023 tcp: introduce icsk->icsk_keepalive_timer
sk->sk_timer has been used for TCP keepalives.

Keepalive timers are not in fast path, we want to use sk->sk_timer
storage for retransmit timers, for better cache locality.

Create icsk->icsk_keepalive_timer and change keepalive
code to no longer use sk->sk_timer.

Added space is reclaimed in the following patch.

This includes changes to MPTCP, which was also using sk_timer.

Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer
for inet_sk_diag_fill() sake.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:28:29 -08:00
Eric Dumazet 27e8257a86 net: move sk_dst_pending_confirm and sk_pacing_status to sock_read_tx group
These two fields are mostly read in TCP tx path, move them
in an more appropriate group for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:28:29 -08:00
Eric Dumazet 3a6e8fd0bf tcp: rename icsk_timeout() to tcp_timeout_expires()
In preparation of sk->tcp_timeout_timer introduction,
rename icsk_timeout() helper and change its argument to plain
'const struct sock *sk'.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:28:28 -08:00
Asbjørn Sloth Tønnesen 68e83f3472 tools: ynl-gen: add regeneration comment
Add a comment on regeneration to the generated files.

The comment is placed after the YNL-GEN line[1], as to not interfere
with ynl-regen.sh's detection logic.

[1] and after the optional YNL-ARG line.

Link: https://lore.kernel.org/r/aR5m174O7pklKrMR@zx2c4.com/
Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251120174429.390574-3-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:20:42 -08:00
Vladimir Oltean f647ed2ca7 net: dsa: append ethtool counters of all hidden ports to conduit
Currently there is no way to see packet counters on cascade ports, and
no clarity on how the API for that would look like.

Because it's something that is currently needed, just extend the hack
where ethtool -S on the conduit interface dumps CPU port counters, and
also use it to dump counters of cascade ports.

Note that the "pXX_" naming convention changes to "sXX_pYY", to
distinguish between ports having the same index but belonging to
different switches. This has a slight chance of causing regressions to
existing tooling:

- grepping for "p04_counter_name" still works, but might return more
  than one string now
- grepping for "    p04_counter_name" no longer works

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251122112311.138784-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 17:33:55 -08:00
Vladimir Oltean 8afabd27fe net: dsa: use kernel data types for ethtool ops on conduit
Suppress some checkpatch 'CHECK' messages about u8 being preferable over
uint8_t, etc. No functional change.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251122112311.138784-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 17:33:55 -08:00
Vladimir Oltean eba81b0a6d net: dsa: cpu_dp->orig_ethtool_ops might be NULL
In theory this would have been seen by now, but it seems that all
drivers used as DSA conduit interfaces thus far have had ethtool_ops
set, and it's hard to even find modern Ethernet drivers (and not VF
ones) which don't use ethtool.

Here is the unfiltered list of drivers which register any sort of
net_device but don't set its ethtool_ops pointer. I don't think any of
them 'risks' being used as a DSA conduit, maybe except for moxart,
rnpbge and icssm, I'm not sure.

- drivers/net/can/dev/dev.c
- drivers/net/wwan/qcom_bam_dmux.c
- drivers/net/wwan/t7xx/t7xx_netdev.c
- drivers/net/arcnet/arcnet.c
- drivers/net/hamradio/
- drivers/net/slip/slip.c
- drivers/net/ethernet/ezchip/nps_enet.c
- drivers/net/ethernet/moxa/moxart_ether.c
- drivers/net/ethernet/wangxun/txgbevf/txgbevf_main.c
- drivers/net/ethernet/wangxun/ngbevf/ngbevf_main.c
- drivers/net/ethernet/huawei/hinic3/hinic3_main.c
- drivers/net/ethernet/i825xx/
- drivers/net/ethernet/ti/icssm/icssm_prueth.c
- drivers/net/ethernet/seeq/
- drivers/net/ethernet/litex/litex_liteeth.c
- drivers/net/ethernet/sunplus/spl2sw_driver.c
- drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.c
- drivers/net/ipa/
- drivers/net/wireless/microchip/wilc1000/
- drivers/net/wireless/mediatek/mt76/dma.c
- drivers/net/wireless/ath/ath12k/
- drivers/net/wireless/ath/ath11k/
- drivers/net/wireless/ath/ath6kl/
- drivers/net/wireless/ath/ath10k/
- drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c
- drivers/net/wireless/virtual/mac80211_hwsim.c
- drivers/net/wireless/quantenna/qtnfmac/pcie/pcie.c
- drivers/net/wireless/realtek/rtw89/core.c
- drivers/net/wireless/realtek/rtw88/pci.c
- drivers/net/caif/
- drivers/net/plip/
- drivers/net/wan/
- drivers/net/mctp/
- drivers/net/ppp/
- drivers/net/thunderbolt/

Nonetheless, it's good for the framework not to make such assumptions,
and not panic when coming across such kind of host device in the future.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251122112311.138784-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 17:33:55 -08:00
Eric Dumazet a6efc273ab net_sched: use qdisc_dequeue_drop() in cake, codel, fq_codel
cake, codel and fq_codel can drop many packets from dequeue().

Use qdisc_dequeue_drop() so that the freeing can happen
outside of the qdisc spinlock scope.

Add TCQ_F_DEQUEUE_DROPS to sch->flags.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-15-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25 16:10:32 +01:00
Eric Dumazet 191ff13e42 net_sched: add qdisc_dequeue_drop() helper
Some qdisc like cake, codel, fq_codel might drop packets
in their dequeue() method.

This is currently problematic because dequeue() runs with
the qdisc spinlock held. Freeing skbs can be extremely expensive.

Add qdisc_dequeue_drop() method and a new TCQ_F_DEQUEUE_DROPS
so that these qdiscs can opt-in to defer the skb frees
after the socket spinlock is released.

TCQ_F_DEQUEUE_DROPS is an attempt to not penalize other qdiscs
with an extra cache line miss.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-14-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25 16:10:32 +01:00
Eric Dumazet 0170d7f47c net_sched: add tcf_kfree_skb_list() helper
Using kfree_skb_list_reason() to free list of skbs from qdisc
operations seems wrong as each skb might have a different drop reason.

Cleanup __dev_xmit_skb() to call tcf_kfree_skb_list() once
in preparation of the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-13-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25 16:10:32 +01:00
Eric Dumazet 4792c3a4c1 net: annotate a data-race in __dev_xmit_skb()
q->limit is read locklessly, add a READ_ONCE().

Fixes: 100dfa74ca ("net: dev_queue_xmit() llist adoption")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-12-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25 16:10:32 +01:00
Eric Dumazet b2e9821cff net: prefech skb->priority in __dev_xmit_skb()
Most qdiscs need to read skb->priority at enqueue time().

In commit 100dfa74ca ("net: dev_queue_xmit() llist adoption")
I added a prefetch(next), lets add another one for the second
half of skb.

Note that skb->priority and skb->hash share a common cache line,
so this patch helps qdiscs needing both fields.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-11-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25 16:10:32 +01:00
Eric Dumazet 2f9babc04d net_sched: sch_fq: prefetch one skb ahead in dequeue()
prefetch the skb that we are likely to dequeue at the next dequeue().

Also call fq_dequeue_skb() a bit sooner in fq_dequeue().

This reduces the window between read of q.qlen and
changes of fields in the cache line that could be dirtied
by another cpu trying to queue a packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-10-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25 16:10:32 +01:00
Eric Dumazet 3c1100f042 net_sched: sch_fq: move qdisc_bstats_update() to fq_dequeue_skb()
Group together changes to qdisc fields to reduce chances of false sharing
if another cpu attempts to acquire the qdisc spinlock.

  qdisc_qstats_backlog_dec(sch, skb);
  sch->q.qlen--;
  qdisc_bstats_update(sch, skb);

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-9-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25 16:10:32 +01:00