Commit Graph

606 Commits (09cfd3c52ea76f43b3cb15e570aeddf633d65e80)

Author SHA1 Message Date
Mark Zhang c6b6677d85 net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
This patch adds new fields to support multi-plane and the extend port
counters group. Actual support will be added in the next patch.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/70221cdd79aad0e21cbf385d9567e3ebffbc5137.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01 15:38:05 +03:00
Mark Zhang 65528cfb21 net/mlx5: mlx5_ifc update for multi-plane support
Add new fields to support mlx5 multi-plane feature. Actual support will
be added in following patches.

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/36a74a1b1d2b7b59c99cda4abad1794ddde30230.1718553901.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01 15:10:15 +03:00
Daniel Jurgens 048a403648 net/mlx5: IFC updates for changing max EQs
Expose new capability to support changing the number of EQs available
to other functions.

Fixes: 93197c7c50 ("mlx5/core: Support max_io_eqs for a function")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28 12:58:10 +01:00
Patrisious Haddad b339e0a39d RDMA/mlx5: Add Qcounters req_transport_retries_exceeded/req_rnr_retries_exceeded
The req_transport_retries_exceeded counter shows the number of times
requester detected transport retries exceed error.

The req_rnr_retries_exceeded counter show the number of times the
requester detected RNR NAKs retries exceed error.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/250466af94f4989d638fab168e246035530e912f.1718301543.git.leon@kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-16 18:53:23 +03:00
Rahul Rameshbabu 296eaab825 net/mlx5e: Support SWP-mode offload L4 csum calculation
Calculate the pseudo-header checksum for both IPSec transport mode and
IPSec tunnel mode for mlx5 devices that do not implement a pure hardware
checksum offload for L4 checksum calculation. Introduce a capability bit
that identifies such mlx5 devices.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14 18:53:24 -07:00
Cosmin Ratiu e575d3a6dd net/mlx5: Correct TASR typo into TSAR
TSAR is the correct spelling (Transmit Scheduling ARbiter).

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240613210036.1125203-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14 18:53:23 -07:00
Yoray Zack 99be56171f net/mlx5e: SHAMPO, Re-enable HW-GRO
Add back HW-GRO to the reported features.

As the current implementation of HW-GRO uses KSMs with a
specific fixed buffer size (256B) to map its headers buffer,
we reported the feature only if the NIC is supporting KSM and
the minimum value for buffer size is below the requested one.

iperf3 bandwidth comparison:
+---------+--------+--------+-----------+
| streams | SW GRO | HW GRO | Unit      |
|---------+--------+--------+-----------|
| 1       | 36     | 42     | Gbits/sec |
| 4       | 34     | 39     | Gbits/sec |
| 8       | 31     | 35     | Gbits/sec |
+---------+--------+--------+-----------+

A downstream patch will add skb fragment coalescing which will improve
performance considerably.

Benchmark details:
VM based setup
CPU: Intel(R) Xeon(R) Platinum 8380 CPU, 24 cores
NIC: ConnectX-7 100GbE
iperf3 and irq running on same CPU over a single receive queue

Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240603212219.1037656-14-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 20:20:46 -07:00
Gal Pressman 1b9f86c6d5 net/mlx5: Fix MTMP register capability offset in MCAM register
The MTMP register (0x900a) capability offset is off-by-one, move it to
the right place.

Fixes: 1f507e80c7 ("net/mlx5: Expose NIC temperature via hardware monitoring kernel API")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Rahul Rameshbabu 445a25f6e1 net/mlx5e: Support updating coalescing configuration without resetting channels
When CQE mode or DIM state is changed, gracefully reconfigure channels to
handle new configuration. Previously, would create new channels that would
reflect the changes rather than update the original channels.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:16 -07:00
Rahul Rameshbabu eca1e8a628 net/mlx5e: Use DIM constants for CQ period mode parameter
Use core DIM CQ period mode enum values for the CQ parameter for the period
mode. Translate the value to the specific mlx5 device constant for the
selected period mode when creating a CQ. Avoid needing to translate mlx5
device constants to DIM constants for core DIM functionality.

Co-developed-by: Nabil S. Alramli <dev@nalramli.com>
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240419080445.417574-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22 14:22:16 -07:00
Cosmin Ratiu 4aafb8ab2a net/mlx5e: Support FEC settings for 100G/lane modes
This consists of:
1. Expose the 100G/lane capability bit in the PCAM reg.
2. Expose the per link mode FEC capability masks in the PPLM reg.
3. Set the overrides according to ethtool parameters.
FEC for new modes is set if and only if the PCAM 100G/lane capability is
advertised and the capability mask for a given link mode reports that it
can accept the requested FEC mode.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240404173357.123307-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05 21:54:40 -07:00
Jianbo Liu c788d79cfa net/mlx5: Skip pages EQ creation for non-page supplier function
Page events are not issued by device on the function if
page_request_disable is set, so no need to create pages EQ.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240402133043.56322-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03 19:47:59 -07:00
Jianbo Liu 137f3d50ad net/mlx5: Support matching on l4_type for ttc_table
Replace matching on TCP and UDP protocols with new l4_type field which
is parsed by steering for ttc_table. It is enabled by the
outer_l4_type or inner_l4_type bits in nic_rx or port_sel flow table
capabilities and used only if pcc_ifa2 bit in HCA capabilities is set.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240402133043.56322-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03 19:47:59 -07:00
Linus Torvalds 4138f02288 VFIO updates for v6.9-rc1
- Add warning in unlikely case that device is not captured with
    driver_override. (Kunwu Chan)
 
  - Error handling improvements in mlx5-vfio-pci to detect firmware
    tracking object error states, logging of firmware error syndrom,
    and releasing of firmware resources in aborted migration sequence.
    (Yishai Hadas)
 
  - Correct an un-alphabetized VFIO MAINTAINERS entry.
    (Alex Williamson)
 
  - Make the mdev_bus_type const and also make the class struct const
    for a couple of the vfio-mdev sample drivers. (Ricardo B. Marliere)
 
  - Addition of a new vfio-pci variant driver for the GPU of NVIDIA's
    Grace-Hopper superchip.  During initialization of the chip-to-chip
    interconnect in this hardware module, the PCI BARs of the device
    become unused in favor of a faster, coherent mechanism for exposing
    device memory.  This driver primarily changes the VFIO
    representation of the device to masquerade this coherent aperture
    to replace the physical PCI BARs for userspace drivers.  This also
    incorporates use of a new vma flag allowing KVM to use write
    combining attributes for uncached device memory.  (Ankit Agrawal)
 
  - Reset fixes and cleanups for the pds-vfio-pci driver.  Save and
    restore files were previously leaked if the device didn't pass
    through an error state, this is resolved and later re-fixed to
    prevent access to the now freed files.  Reset handling is also
    refactored to remove the complicated deferred reset mechanism.
    (Brett Creeley)
 
  - Remove some references to pl330 in the vfio-platform amba
    driver. (Geert Uytterhoeven)
 
  - Remove twice redundant and ugly code to unpin incidental pins
    of the zero-page. (Alex Williamson)
 
  - Deferred reset logic is also removed from the hisi-acc-vfio-pci
    driver as a simplification. (Shameer Kolothum)
 
  - Enforce that mlx5-vfio-pci devices must support PRE_COPY and
    remove resulting unnecessary code.  There is no device firmware
    that has been available publicly without this support.
    (Yishai Hadas)
 
  - Switch over to using the .remove_new callback for vfio-platform
    in support of the broader transition for a void remove function.
    (Uwe Kleine-König)
 
  - Resolve multiple issues in interrupt code for VFIO bus drivers
    that allow calling eventfd_signal() on a NULL context.  This
    also remove a potential race in INTx setup on certain hardware
    for vfio-pci, races with various mechanisms to mask INTx, and
    leaked virqfds in vfio-platform. (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmXzesgbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiA4oQAKU3Z6h8oQXaMsc2nKip
 NnOtrrKw2jIohEGw01uRUf8q9uhLeLE0bidrDETion812/Lyv7M/aDlLIK4nvDvt
 AAFwL1iAKbVYTomIIWQckCwki5gBp3I+1vAQekJn4qXe7B9GohNz9cl9fNLVpcNd
 X3rWUVB5LVOvSzI+o6Ueqau+XFOMxpndr9VX4zbknIa0Th49EoYGYWPAYjzN4YyV
 GVSIWJHbtpAAHsL46jc7HmCeAtsVVkW/qHPInerSPCxabiQ+i0LSnlM16j6xXjK1
 9SvJi7+FCRGTvF3Ql2sWTK65glEbQ0xBzwSIs0L3AuKHsRISGbCHP1wymriJr5K7
 +asIM18HNLfmH/BAksbrd2M5gys8/xO9+7xIzTaYlZyTNM99Zu7d/u0B3AjYemG/
 Me3N86E2cl9Xc3NV6UEX8L1/pPpg6jKiOcZ6V9pGycUMyOTJS36FJT8Czr/jemtA
 /y6HOBpjE1gMACkk63P8GQaLMnQs7glSAEg2e++MvUVIW5END7usyLrSDr87Ysoa
 O0deH5FNSW6QAbGV+PRAmaacPvGF5B8BppYm/lnxHmPf0+saVEYOSbdFoSpK4l5H
 8f79fCtzpY8in5EDIOGJvu4u5ZWWoS/5dFLr0Teyj4vyK//PLmBv0gWoGRJV6aqj
 BrJ1f9NQ0+lVTIj/jTQ5uzlC
 =m5jA
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.9-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Add warning in unlikely case that device is not captured with
   driver_override (Kunwu Chan)

 - Error handling improvements in mlx5-vfio-pci to detect firmware
   tracking object error states, logging of firmware error syndrom, and
   releasing of firmware resources in aborted migration sequence (Yishai
   Hadas)

 - Correct an un-alphabetized VFIO MAINTAINERS entry (Alex Williamson)

 - Make the mdev_bus_type const and also make the class struct const for
   a couple of the vfio-mdev sample drivers (Ricardo B. Marliere)

 - Addition of a new vfio-pci variant driver for the GPU of NVIDIA's
   Grace-Hopper superchip. During initialization of the chip-to-chip
   interconnect in this hardware module, the PCI BARs of the device
   become unused in favor of a faster, coherent mechanism for exposing
   device memory. This driver primarily changes the VFIO representation
   of the device to masquerade this coherent aperture to replace the
   physical PCI BARs for userspace drivers. This also incorporates use
   of a new vma flag allowing KVM to use write combining attributes for
   uncached device memory (Ankit Agrawal)

 - Reset fixes and cleanups for the pds-vfio-pci driver. Save and
   restore files were previously leaked if the device didn't pass
   through an error state, this is resolved and later re-fixed to
   prevent access to the now freed files. Reset handling is also
   refactored to remove the complicated deferred reset mechanism (Brett
   Creeley)

 - Remove some references to pl330 in the vfio-platform amba driver
   (Geert Uytterhoeven)

 - Remove twice redundant and ugly code to unpin incidental pins of the
   zero-page (Alex Williamson)

 - Deferred reset logic is also removed from the hisi-acc-vfio-pci
   driver as a simplification (Shameer Kolothum)

 - Enforce that mlx5-vfio-pci devices must support PRE_COPY and remove
   resulting unnecessary code. There is no device firmware that has been
   available publicly without this support (Yishai Hadas)

 - Switch over to using the .remove_new callback for vfio-platform in
   support of the broader transition for a void remove function (Uwe
   Kleine-König)

 - Resolve multiple issues in interrupt code for VFIO bus drivers that
   allow calling eventfd_signal() on a NULL context. This also remove a
   potential race in INTx setup on certain hardware for vfio-pci, races
   with various mechanisms to mask INTx, and leaked virqfds in
   vfio-platform (Alex Williamson)

* tag 'vfio-v6.9-rc1' of https://github.com/awilliam/linux-vfio: (29 commits)
  vfio/fsl-mc: Block calling interrupt handler without trigger
  vfio/platform: Create persistent IRQ handlers
  vfio/platform: Disable virqfds on cleanup
  vfio/pci: Create persistent INTx handler
  vfio: Introduce interface to flush virqfd inject workqueue
  vfio/pci: Lock external INTx masking ops
  vfio/pci: Disable auto-enable of exclusive INTx IRQ
  vfio/pds: Refactor/simplify reset logic
  vfio/pds: Make sure migration file isn't accessed after reset
  vfio/platform: Convert to platform remove callback returning void
  vfio/mlx5: Enforce PRE_COPY support
  vfio/mbochs: make mbochs_class constant
  vfio/mdpy: make mdpy_class constant
  hisi_acc_vfio_pci: Remove the deferred_reset logic
  Revert "vfio/type1: Unpin zero pages"
  vfio/nvgrace-gpu: Convey kvm to map device memory region as noncached
  vfio: amba: Rename pl330_ids[] to vfio_amba_ids[]
  vfio/pds: Always clear the save/restore FDs on reset
  vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper
  vfio/pci: rename and export range_intersect_range
  ...
2024-03-15 13:21:13 -07:00
Jakub Kicinski d7e14e5344 Support Multi-PF netdev (Socket Direct)
This series adds support for combining multiple devices (PFs) of the
 same port under one netdev instance. Passing traffic through different
 devices belonging to different NUMA sockets saves cross-numa traffic and
 allows apps running on the same netdev from different numas to still
 feel a sense of proximity to the device and achieve improved
 performance.
 
 We achieve this by grouping PFs together, and creating the netdev only
 once all group members are probed. Symmetrically, we destroy the netdev
 once any of the PFs is removed.
 
 The channels are distributed between all devices, a proper configuration
 would utilize the correct close numa when working on a certain app/cpu.
 
 We pick one device to be a primary (leader), and it fills a special
 role.  The other devices (secondaries) are disconnected from the network
 in the chip level (set to silent mode). All RX/TX traffic is steered
 through the primary to/from the secondaries.
 
 Currently, we limit the support to PFs only, and up to two devices
 (sockets).
 
 V6:
 - Address documentation comments from Jakub.
 
 V5:
  - Address documentation comments from Przemek Kitszel.
 
 V4:
  - Improve documentation for better user observability and understanding
    of the feature, in terms of queues and their expected NUMA/CPU/IRQ
    affinity.
 
 V3:
  - Fix documentation per Jakubs feedback.
  - Fix typos
  - Link new documentation in the networking index.rst
 
 V2:
  - Add documentation in a new patch.
  - Add debugfs in a new patch.
  - Add mlx5_ifc bit for MPIR cap check and use it before query.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmXpfYgACgkQSD+KveBX
 +j5jIAf/VGIX/UQttq74MzK9pWgJNKtf7l8aSYtZuKXx68pmpr+25DfsxbKEeVfy
 KzjvGFx5peoKisWILyaljQXSn7snmSqOsQf/IwDzmsmF/2ZTDyf6NPC6gND0bIjJ
 Uu6cJ2T6Sa9ktg+ANz/gLDvGBBfPqSYTYIXrJnNQKsnW6nV8mDvy4WVf6etvCxOi
 rMjfcqwNijf3GPTJd/qkaWhwneDG2AFWd5HzdORpNh6iuv8Cbc9aNhWgAPh18o7v
 VWuAiFraTgaz6jj2H/NfziAk4ZrtVsCqhaFjJe3eLO+MCk/bZ/SizsAcR61JLkjL
 pFqh5wqxA6v+5YJm4zVatZqPLIt4gQ==
 =GZBa
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Support Multi-PF netdev (Socket Direct)

This series adds support for combining multiple devices (PFs) of the
same port under one netdev instance. Passing traffic through different
devices belonging to different NUMA sockets saves cross-numa traffic and
allows apps running on the same netdev from different numas to still
feel a sense of proximity to the device and achieve improved
performance.

We achieve this by grouping PFs together, and creating the netdev only
once all group members are probed. Symmetrically, we destroy the netdev
once any of the PFs is removed.

The channels are distributed between all devices, a proper configuration
would utilize the correct close numa when working on a certain app/cpu.

We pick one device to be a primary (leader), and it fills a special
role.  The other devices (secondaries) are disconnected from the network
in the chip level (set to silent mode). All RX/TX traffic is steered
through the primary to/from the secondaries.

Currently, we limit the support to PFs only, and up to two devices
(sockets).

* tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  Documentation: networking: Add description for multi-pf netdev
  net/mlx5: Enable SD feature
  net/mlx5e: Block TLS device offload on combined SD netdev
  net/mlx5e: Support per-mdev queue counter
  net/mlx5e: Support cross-vhca RSS
  net/mlx5e: Let channels be SD-aware
  net/mlx5e: Create EN core HW resources for all secondary devices
  net/mlx5e: Create single netdev per SD group
  net/mlx5: SD, Add debugfs
  net/mlx5: SD, Add informative prints in kernel log
  net/mlx5: SD, Implement steering for primary and secondaries
  net/mlx5: SD, Implement devcom communication and primary election
  net/mlx5: SD, Implement basic query and instantiation
  net/mlx5: SD, Introduce SD lib
  net/mlx5: Add MPIR bit in mcam_access_reg
====================

Link: https://lore.kernel.org/r/20240307084229.500776-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08 20:45:17 -08:00
Jakub Kicinski e3afe5dd3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

net/core/page_pool_user.c
  0b11b1c5c3 ("netdev: let netlink core handle -EMSGSIZE errors")
  429679dcf7 ("page_pool: fix netlink dump stop/resume")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 10:29:36 -08:00
Tariq Toukan a0873a5d54 net/mlx5: Add MPIR bit in mcam_access_reg
Add a cap bit in mcam_access_reg to check for MPIR support.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07 00:40:38 -08:00
Moshe Shemesh 5e6107b499 net/mlx5: Check capability for fw_reset
Functions which can't access MFRL (Management Firmware Reset Level)
register, have no use of fw_reset structures or events. Remove fw_reset
structures allocation and registration for fw reset events notifications
for these functions.

Having the devlink param enable_remote_dev_reset on functions that don't
have this capability is misleading as these functions are not allowed to
influence the reset flow. Hence, this patch removes this parameter for
such functions.

In addition, return not supported on devlink reload action fw_activate
for these functions.

Fixes: 38b9f903f2 ("net/mlx5: Handle sync reset request event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01 23:02:26 -08:00
Jakub Kicinski fecc51559a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/udp.c
  f796feabb9 ("udp: add local "peek offset enabled" flag")
  56667da739 ("net: implement lockless setsockopt(SO_PEEK_OFF)")

Adjacent changes:

net/unix/garbage.c
  aa82ac51d6 ("af_unix: Drop oob_skb ref before purging queue in GC.")
  11498715f2 ("af_unix: Remove io_uring code for GC.")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 15:29:26 -08:00
Yishai Hadas 1cbcb564f5 net/mlx5: Add the IFC related bits for query tracker
Add the IFC related bits for query tracker.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Link: https://lore.kernel.org/r/20240205124828.232701-2-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-02-22 12:17:32 -07:00
Linus Torvalds 9fc1ccccfd RDMA v6.8 first rc
irdma and bnxt_re fixes:
 
 - Missing error unwind in hf1
 
 - For bnxt - fix fenching behavior to work on new chips, fail unsupported
   SRQ resize back to userspace, propogate SRQ FW failure back to
   userspace.
 
 - Correctly fail unsupported SRQ resize back to userspace in bnxt
 
 - Adjust a memcpy in mlx5 to not overflow a struct field.
 
 - Prevent userspace from triggering mlx5 fw syndrome logging from sysfs
 
 - Use the correct access mode for MLX5_IB_METHOD_DEVX_OBJ_MODIFY to avoid
   a userspace failure on modify
 
 - For irdma - Don't UAF a concurrent tasklet during destroy, prevent
   userspace from issuing invalid QP attrs, fix a possible CQ overflow,
   capture a missing HW async error event
 
 - sendmsg() triggerable memory access crash in hfi1
 
 - Fix the srpt_service_guid parameter to not crash due to missing function
   pointer
 
 - Don't leak objects in error unwind in qedr
 
 - Don't weirdly cast function pointers in srpt
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZdUweQAKCRCFwuHvBreF
 YXH/AQCsg2hU9k0CiyDql2pW+fu+91bG4FFHed1zimdpzIrXcAEA1qcFIK+EWlPZ
 HDxyvbp3dupmnH2QHjEQChzXt3EhdQk=
 =+bP+
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Mostly irdma and bnxt_re fixes:

   - Missing error unwind in hf1

   - For bnxt - fix fenching behavior to work on new chips, fail
     unsupported SRQ resize back to userspace, propogate SRQ FW failure
     back to userspace.

   - Correctly fail unsupported SRQ resize back to userspace in bnxt

   - Adjust a memcpy in mlx5 to not overflow a struct field.

   - Prevent userspace from triggering mlx5 fw syndrome logging from
     sysfs

   - Use the correct access mode for MLX5_IB_METHOD_DEVX_OBJ_MODIFY to
     avoid a userspace failure on modify

   - For irdma - Don't UAF a concurrent tasklet during destroy, prevent
     userspace from issuing invalid QP attrs, fix a possible CQ
     overflow, capture a missing HW async error event

   - sendmsg() triggerable memory access crash in hfi1

   - Fix the srpt_service_guid parameter to not crash due to missing
     function pointer

   - Don't leak objects in error unwind in qedr

   - Don't weirdly cast function pointers in srpt"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/srpt: fix function pointer cast warnings
  RDMA/qedr: Fix qedr_create_user_qp error flow
  RDMA/srpt: Support specifying the srpt_service_guid parameter
  IB/hfi1: Fix sdma.h tx->num_descs off-by-one error
  RDMA/irdma: Add AE for too many RNRS
  RDMA/irdma: Set the CQ read threshold for GEN 1
  RDMA/irdma: Validate max_send_wr and max_recv_wr
  RDMA/irdma: Fix KASAN issue with tasklet
  RDMA/mlx5: Relax DEVX access upon modify commands
  IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported
  RDMA/mlx5: Fix fortify source warning while accessing Eth segment
  RDMA/bnxt_re: Add a missing check in bnxt_qplib_query_srq
  RDMA/bnxt_re: Return error for SRQ resize
  RDMA/bnxt_re: Fix unconditional fence for newer adapters
  RDMA/bnxt_re: Remove a redundant check inside bnxt_re_vf_res_config
  RDMA/bnxt_re: Avoid creating fence MR for newer adapters
  IB/hfi1: Fix a memleak in init_credit_return
2024-02-20 17:00:26 -08:00
Gal Pressman 91a72ada66 net/mlx5: Remove initial segmentation duplicate definitions
Device definitions belong in mlx5_ifc, remove the duplicates in
mlx5_core.h.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-05 16:45:52 -08:00
Jiri Pirko 2c54a4d712 net/mlx5: DPLL, Implement lock status error value
Fill-up the lock status error value properly.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 15:39:44 +01:00
Mark Zhang 43fdbd1402 IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported
debugfs entries for RRoCE general CC parameters must be exposed only when
they are supported, otherwise when accessing them there may be a syndrome
error in kernel log, for example:

$ cat /sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp
cat: '/sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp': Invalid argument
$ dmesg
 mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1253): QUERY_CONG_PARAMS(0x824) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x325a82), err(-22)

Fixes: 66fb1d5df6 ("IB/mlx5: Extend debug control for CC parameters")
Reviewed-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/e7ade70bad52b7468bdb1de4d41d5fad70c8b71c.1706433934.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-01-31 11:15:29 +02:00
Moshe Shemesh ec7cc38ef9 net/mlx5: Bridge, fix multicast packets sent to uplink
To enable multicast packets which are offloaded in bridge multicast
offload mode to be sent also to uplink, FTE bit uplink_hairpin_en should
be set. Add this bit to FTE for the bridge multicast offload rules.

Fixes: 18c2916cee ("net/mlx5: Bridge, snoop igmp/mld packets")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24 00:15:35 -08:00
Tariq Toukan cfbc3608a8 net/mlx5: Fix query of sd_group field
The sd_group field moved in the HW spec from the MPIR register
to the vport context.
Align the query accordingly.

Fixes: f5e9563299 ("net/mlx5: Expose Management PCIe Index Register (MPIR)")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24 00:15:33 -08:00
Linus Torvalds 0b7359ccdd virtio: features, fixes
vdpa/mlx5: support for resumable vqs
 virtio_scsi: mq_poll support
 3virtio_pmem: support SHMEM_REGION
 virtio_balloon: stay awake while adjusting balloon
 virtio: support for no-reset virtio PCI PM
 
 Fixes, cleanups.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmWmgP8PHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpcfgH/0RD2S+NFY0ZEJz8BuI6GjykzYnyRW9iyxcw
 epTLjPUcoEBttlw8TA+3PiPoNIJGfuU8Q4iKXJ8Jzql081tP9G1UxTIbj0v3Hx+q
 0L2DUXfdAMYMLo5WQVl/PADV/10xLgExEh9jMqpU3IJIxVaLE/knD9ghRCDvDbs/
 fOo3sSUGaNsSHYZs5bH73Q7cRKKmTLO+MzvHBbavFfz2fQ1b3vwecmJuQtAtK0JC
 6JxH6Y38VfOl8jA6IHeEpGIHeF661HABkDDUh4UVEGOeyBl4E6ZcG4fjWSMinZ08
 U3TbQLYOq10i8ki2LJKgoZHRv1HkxbM1Ogn0bsIh1hish8dPORM=
 =RWjR
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - vdpa/mlx5: support for resumable vqs

 - virtio_scsi: mq_poll support

 - 3virtio_pmem: support SHMEM_REGION

 - virtio_balloon: stay awake while adjusting balloon

 - virtio: support for no-reset virtio PCI PM

 - Fixes, cleanups

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa/mlx5: Add mkey leak detection
  vdpa/mlx5: Introduce reference counting to mrs
  vdpa/mlx5: Use vq suspend/resume during .set_map
  vdpa/mlx5: Mark vq state for modification in hw vq
  vdpa/mlx5: Mark vq addrs for modification in hw vq
  vdpa/mlx5: Introduce per vq and device resume
  vdpa/mlx5: Allow modifying multiple vq fields in one modify command
  vdpa/mlx5: Expose resumable vq capability
  vdpa: Block vq property changes in DRIVER_OK
  vdpa: Track device suspended state
  scsi: virtio_scsi: Add mq_poll support
  virtio_pmem: support feature SHMEM_REGION
  virtio_balloon: stay awake while adjusting balloon
  vdpa: Remove usage of the deprecated ida_simple_xx() API
  virtio: Add support for no-reset virtio PCI PM
  virtio_net: fix missing dma unmap for resize
  vhost-vdpa: account iommu allocations
  vdpa: Fix an error handling path in eni_vdpa_probe()
2024-01-18 16:44:03 -08:00
Linus Torvalds bf9ca811bb RDMA v6.8 merge window
Small cycle, with some typical driver updates
 
  - General code tidying in siw, hfi1, idrdma, usnic, hns rtrs and bnxt_re
 
  - Many small siw cleanups without an overeaching theme
 
  - Debugfs stats for hns
 
  - Fix a TX queue timeout in IPoIB and missed locking of the mcast list
 
  - Support more features of P7 devices in bnxt_re including a new work
    submission protocol
 
  - CQ interrupts for MANA
 
  - netlink stats for erdma
 
  - EFA multipath PCI support
 
  - Fix Incorrect MR invalidation in iser
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZaCQDQAKCRCFwuHvBreF
 YemEAQCTGebv0k2hbocDOmKml5awt8j9aDJX3aO7Zpfi0AYUtwEAzk+kgN4yAo+B
 Vinvpu171zry+QvmGJsXv2mtZkXH6QY=
 =HT3p
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Small cycle, with some typical driver updates:

   - General code tidying in siw, hfi1, idrdma, usnic, hns rtrs and
     bnxt_re

   - Many small siw cleanups without an overeaching theme

   - Debugfs stats for hns

   - Fix a TX queue timeout in IPoIB and missed locking of the mcast
     list

   - Support more features of P7 devices in bnxt_re including a new work
     submission protocol

   - CQ interrupts for MANA

   - netlink stats for erdma

   - EFA multipath PCI support

   - Fix Incorrect MR invalidation in iser"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits)
  RDMA/bnxt_re: Fix error code in bnxt_re_create_cq()
  RDMA/efa: Add EFA query MR support
  IB/iser: Prevent invalidating wrong MR
  RDMA/erdma: Add hardware statistics support
  RDMA/erdma: Introduce dma pool for hardware responses of CMDQ requests
  IB/iser: iscsi_iser.h: fix kernel-doc warning and spellos
  RDMA/mana_ib: Add CQ interrupt support for RAW QP
  RDMA/mana_ib: query device capabilities
  RDMA/mana_ib: register RDMA device with GDMA
  RDMA/bnxt_re: Fix the sparse warnings
  RDMA/bnxt_re: Fix the offset for GenP7 adapters for user applications
  RDMA/bnxt_re: Share a page to expose per CQ info with userspace
  RDMA/bnxt_re: Add UAPI to share a page with user space
  IB/ipoib: Fix mcast list locking
  RDMA/mlx5: Expose register c0 for RDMA device
  net/mlx5: E-Switch, expose eswitch manager vport
  net/mlx5: Manage ICM type of SW encap
  RDMA/mlx5: Support handling of SW encap ICM area
  net/mlx5: Introduce indirect-sw-encap ICM properties
  RDMA/bnxt_re: Adds MSN table capability for Gen P7 adapters
  ...
2024-01-12 13:52:21 -08:00
Dragos Tatulea ef067191f7 vdpa/mlx5: Expose resumable vq capability
Necessary for checking if resumable vqs are supported by the hardware.
Actual support will be added in a downstream patch.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20231225151203.152687-2-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-10 13:01:37 -05:00
Jakub Kicinski 3fbf61207c Revert "mlx5 updates 2023-12-20"
Revert "net/mlx5: Implement management PF Ethernet profile"
This reverts commit 22c4640698.
Revert "net/mlx5: Enable SD feature"
This reverts commit c88c49ac9c.
Revert "net/mlx5e: Block TLS device offload on combined SD netdev"
This reverts commit 83a59ce005.
Revert "net/mlx5e: Support per-mdev queue counter"
This reverts commit d72baceb92.
Revert "net/mlx5e: Support cross-vhca RSS"
This reverts commit c73a3ab8fa.
Revert "net/mlx5e: Let channels be SD-aware"
This reverts commit e4f9686bde.
Revert "net/mlx5e: Create EN core HW resources for all secondary devices"
This reverts commit c4fb94aa82.
Revert "net/mlx5e: Create single netdev per SD group"
This reverts commit e2578b4f98.
Revert "net/mlx5: SD, Add informative prints in kernel log"
This reverts commit c82d360325.
Revert "net/mlx5: SD, Implement steering for primary and secondaries"
This reverts commit 605fcce33b.
Revert "net/mlx5: SD, Implement devcom communication and primary election"
This reverts commit a45af9a967.
Revert "net/mlx5: SD, Implement basic query and instantiation"
This reverts commit 63b9ce944c.
Revert "net/mlx5: SD, Introduce SD lib"
This reverts commit 4a04a31f49.
Revert "net/mlx5: Fix query of sd_group field"
This reverts commit e04984a373.
Revert "net/mlx5e: Use the correct lag ports number when creating TISes"
This reverts commit a7e7b40c4b.

There are some unanswered questions on the list, and we don't
have any docs. Given the lack of replies so far and the fact
that v6.8 merge window has started - let's revert this and
revisit for v6.9.

Link: https://lore.kernel.org/all/20231221005721.186607-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-07 17:16:11 -08:00
Armen Ratner 22c4640698 net/mlx5: Implement management PF Ethernet profile
Add management PF modules, which introduce support for the structures
needed to create the resources for the MGMT PF to work.
Also, add the necessary calls and functions to establish this
functionality.

Signed-off-by: Armen Ratner <armeng@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
2023-12-20 16:54:27 -08:00
Tariq Toukan e04984a373 net/mlx5: Fix query of sd_group field
The sd_group field moved in the HW spec from the MPIR register
to the vport context.
Align the query accordingly.

Fixes: f5e9563299 ("net/mlx5: Expose Management PCIe Index Register (MPIR)")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20 16:54:24 -08:00
David S. Miller 12da68e27b mlx5-updates-2023-12-13
Preparation for mlx5e socket direct feature.
 
 Socket direct will allow multiple PF devices attached to different
 NUMA nodes but sharing the same physical port.
 
 The following series is a small refactoring series in preparation
 to support socket direct in the following submission.
 
 Highlights:
  - Define required device registers and bits related to socket direct
  - Flow steering re-arrangements
  - Generalize TX objects (TISs) and store them in a common object, will
    be useful in the next series for per function object management.
  - Decouple raw CQ objects from their parent netdev priv
  - Prepare devcom for Socket Direct device group discovery.
 
 Please see the individual patches for more information.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmV6YnQACgkQSD+KveBX
 +j48MggAi+iKb6supNWaDkSufLsJu7R56jCJK/mHEl/3586wsoiAx8e7BcAzB2FS
 J6xfEJOwC++slNlIubWgEAvH8vv8YHI0K8q4+IUaC19FY3uy9z4GdPAAvzhqUbQ7
 D1kWQZ4KIwDb5FbniwHw4V88ZBAGXl5Rhm7Wk/T90B4aG3hyGCeYOlsd0DnWNUae
 ea7OkMI1s7QZkJ2HL1KS3xy6WMfrTi8G0TnznP0iLX51FYHciIujp0lfeZZlAJgo
 +ovMUl2TTKb0yuKuWBapOz6w7le8VixWp7LbVYyd/wVZ1jBQuScgYHA6EeUKhdb8
 9rZ726sSNtMjfebcx85KgkYYNleEKA==
 =e11K
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-12-13

Preparation for mlx5e socket direct feature.

Socket direct will allow multiple PF devices attached to different
NUMA nodes but sharing the same physical port.

The following series is a small refactoring series in preparation
to support socket direct in the following submission.

Highlights:
 - Define required device registers and bits related to socket direct
 - Flow steering re-arrangements
 - Generalize TX objects (TISs) and store them in a common object, will
   be useful in the next series for per function object management.
 - Decouple raw CQ objects from their parent netdev priv
 - Prepare devcom for Socket Direct device group discovery.

Please see the individual patches for more information.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-15 10:00:02 +00:00
Jakub Kicinski 8f674972d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/intel/iavf/iavf_ethtool.c
  3a0b5a2929 ("iavf: Introduce new state machines for flow director")
  95260816b4 ("iavf: use iavf_schedule_aq_request() helper")
https://lore.kernel.org/all/84e12519-04dc-bd80-bc34-8cf50d7898ce@intel.com/

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  c13e268c07 ("bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logic")
  c2f8063309 ("bnxt_en: Refactor RX VLAN acceleration logic.")
  a7445d6980 ("bnxt_en: Add support for new RX and TPA_START completion types for P7")
  1c7fd6ee2f ("bnxt_en: Rename some macros for the P5 chips")
https://lore.kernel.org/all/20231211110022.27926ad9@canb.auug.org.au/

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
  bd6781c18c ("bnxt_en: Fix wrong return value check in bnxt_close_nic()")
  84793a4995 ("bnxt_en: Skip nic close/open when configuring tstamp filters")
https://lore.kernel.org/all/20231214113041.3a0c003c@canb.auug.org.au/

drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
  3d7a3f2612 ("net/mlx5: Nack sync reset request when HotPlug is enabled")
  cecf44ea1a ("net/mlx5: Allow sync reset flow when BF MGT interface device is present")
https://lore.kernel.org/all/20231211110328.76c925af@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-14 17:14:41 -08:00
Tariq Toukan f5e9563299 net/mlx5: Expose Management PCIe Index Register (MPIR)
MPIR register allows to query the PCIe indexes
and Socket-Direct related parameters.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-13 18:03:30 -08:00
Tariq Toukan 13049408a4 net/mlx5: Add mlx5_ifc bits used for supporting single netdev Socket-Direct
Multiple device caps and features are required to support
single netdev Socket-Direct.
Add them here in preparation for the feature implementation.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-13 18:03:30 -08:00
Shun Hao 1ca51628e7 net/mlx5: Introduce indirect-sw-encap ICM properties
Add new fields for device memory capabilities, in order to support
creation of new ICM memory type of SW encap.

Signed-off-by: Shun Hao <shunh@nvidia.com>
Link: https://lore.kernel.org/r/107cca7dd6a932a1704abf6ebd1b801105546a8e.1701871118.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-12-12 09:03:46 +02:00
Leon Romanovsky c2bf84f1d1 net/mlx5e: Tidy up IPsec NAT-T SA discovery
IPsec NAT-T packets are UDP encapsulated packets over ESP normal ones.
In case they arrive to RX, the SPI and ESP are located in inner header,
while the check was performed on outer header instead.

That wrong check caused to the situation where received rekeying request
was missed and caused to rekey timeout, which "compensated" this failure
by completing rekeying.

Fixes: d659549349 ("net/mlx5e: Support IPsec NAT-T functionality")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04 22:11:52 -08:00
Leon Romanovsky a5e400a985 net/mlx5e: Honor user choice of IPsec replay window size
Users can configure IPsec replay window size, but mlx5 driver didn't
honor their choice and set always 32bits. Fix assignment logic to
configure right size from the beginning.

Fixes: 7db21ef456 ("net/mlx5e: Set IPsec replay sequence numbers")
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04 22:11:51 -08:00
Rahul Rameshbabu 4aea6a6d61 net/mlx5: Query maximum frequency adjustment of the PTP hardware clock
Some mlx5 devices do not support the default advertised maximum frequency
adjustment value for the PTP hardware clock that is set by the driver.
These devices need to be queried when initializing the clock functionality
in order to get the maximum supported frequency adjustment value. This
value can be greater than the minimum supported frequency adjustment across
mlx5 devices (50 million ppb).

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-15 11:34:31 -08:00
Linus Torvalds 77fa2fbe87 vhost,virtio,vdpa: features, fixes, cleanups
vdpa/mlx5:
 	VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
 	new maintainer
 vdpa:
 	support for vq descriptor mappings
 	decouple reset of iotlb mapping from device reset
 
 fixes, cleanups all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmVCUMYPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRp4L0H/RKcnNXPRqzhhBI1XVQ11Z8CO8WjovcmJalu
 ADHNEGmvuWnY79fp9eLiZ4iVaTx1qbzqIB5Q500DJ65jh71W7UQ8ww6CGjNUoRGs
 Zoe4G09WoOf4bvDZZzVV7ml/AzMdsHWSZK8pxY3QI9CsC9Zfp9hg20QYxPylCqYx
 SIJx7w2MkoojfmtOHRx1WUxaQz99yfU4Z0C5PxtRE1HGN6/a1aY0P0CAl5jq8uCK
 U5sCRsfCmP7VKlspeEddMiPA35ADbCiysSobCbwGVQEs5cHpMUX7KWa+oV0tF/PY
 9uyJb2rJy6zG3tXmL4XNib665ZR86HX6qiWRfm2nBQQStuHaJyg=
 =mXgo
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "vhost,virtio,vdpa: features, fixes, cleanups.

  vdpa/mlx5:
   - VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
   - new maintainer

  vdpa:
   - support for vq descriptor mappings
   - decouple reset of iotlb mapping from device reset

  and fixes, cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
  vdpa_sim: implement .reset_map support
  vdpa/mlx5: implement .reset_map driver op
  vhost-vdpa: clean iotlb map during reset for older userspace
  vdpa: introduce .compat_reset operation callback
  vhost-vdpa: introduce IOTLB_PERSIST backend feature bit
  vhost-vdpa: reset vendor specific mapping to initial state in .release
  vdpa: introduce .reset_map operation callback
  virtio_pci: add check for common cfg size
  virtio-blk: fix implicit overflow on virtio_max_dma_size
  virtio_pci: add build offset check for the new common cfg items
  virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit
  vduse: make vduse_class constant
  vhost-scsi: Spelling s/preceeding/preceding/g
  virtio: kdoc for struct virtio_pci_modern_device
  vdpa: Update sysfs ABI documentation
  MAINTAINERS: Add myself as mlx5_vdpa driver
  virtio-balloon: correct the comment of virtballoon_migratepage()
  mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
  vdpa/mlx5: Update cvq iotlb mapping on ASID change
  vdpa/mlx5: Make iotlb helper functions more generic
  ...
2023-11-05 09:02:32 -10:00
Jakub Kicinski 1bc60524ca Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:

====================
This PR is collected from
https://lore.kernel.org/all/cover.1695296682.git.leon@kernel.org

This series from Patrisious extends mlx5 to support IPsec packet offload
in multiport devices (MPV, see [1] for more details).

These devices have single flow steering logic and two netdev interfaces,
which require extra logic to manage IPsec configurations as they performed
on netdevs.

[1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/

* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Handle IPsec steering upon master unbind/bind
  net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic
  net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic
  net/mlx5: Add create alias flow table function to ipsec roce
  net/mlx5: Implement alias object allow and create functions
  net/mlx5: Add alias flow table bits
  net/mlx5: Store devcom pointer inside IPsec RoCE
  net/mlx5: Register mlx5e priv to devcom in MPV mode
  RDMA/mlx5: Send events from IB driver about device affiliation state
  net/mlx5: Introduce ifc bits for migration in a chunk mode

====================

Link: https://lore.kernel.org/r/20231002083832.19746-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 09:35:34 -07:00
Dragos Tatulea d424348b06 vdpa/mlx5: Expose descriptor group mkey hw capability
Necessary for improved live migration flow. Actual support will be added
in a downstream patch.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/r/20230928164550.980832-3-dtatulea@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 13:12:05 +03:00
Patrisious Haddad ef36ffcb38 net/mlx5: Add alias flow table bits
Add all the capabilities needed to check for alias object support.
As well as all the fields or commands needed for its creation and
the creation of flow table that is able to jump to an alias object.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/544c030f2a78c4adf3fe6b64f97a39cc1bbdabb9.1695296682.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02 11:21:11 +03:00
Yishai Hadas 5aa4c9608d net/mlx5: Introduce ifc bits for migration in a chunk mode
Introduce ifc related stuff to enable migration in a chunk mode.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20230911093856.81910-2-yishaih@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-28 13:58:38 +03:00
Moshe Shemesh e0cc92fd94 net/mlx5: Add a health error syndrome for pci data poisoned
Add new health error syndrome to indicate that pci data poisoned error
has been received while fetching device ICM data.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-09-19 23:33:08 -07:00
Jiri Pirko 496fd0a26b mlx5: Implement SyncE support using DPLL infrastructure
Implement SyncE support using newly introduced DPLL support.
Make sure that each PFs/VFs/SFs probed with appropriate capability
will spawn a dpll auxiliary device and register appropriate dpll device
and pin instances.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17 11:50:20 +01:00
Leon Romanovsky 17c8da5a34 net/mlx5: Add IFC bits to support IPsec enable/disable
Add hardware definitions to allow to control IPSec capabilities.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-6-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27 17:08:45 -07:00
Shay Drory 0b4eb603d6 net/mlx5: Remove unused CAPs
mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but
there isn't any user who requires them.
As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used.

Thus, drop all usages and definitions of the mentioned caps above.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-14 14:40:22 -07:00
Moshe Shemesh a9f168e4c6 net/mlx5: Check with FW that sync reset completed successfully
Even if the PF driver had no error on his part of the sync reset flow,
the firmware can see wider picture as it syncs all the PFs in the flow.
So add at end of sync reset flow check with firmware by reading MFRL
register and initialization segment that the flow had no issue from
firmware point of view too.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-14 14:40:21 -07:00
Adham Faris 1f507e80c7 net/mlx5: Expose NIC temperature via hardware monitoring kernel API
Expose NIC temperature by implementing hwmon kernel API, which turns
current thermal zone kernel API to redundant.

For each one of the supported and exposed thermal diode sensors, expose
the following attributes:
1) Input temperature.
2) Highest temperature.
3) Temperature label:
   Depends on the firmware capability, if firmware doesn't support
   sensors naming, the fallback naming convention would be: "sensorX",
   where X is the HW spec (MTMP register) sensor index.
4) Temperature critical max value:
   refers to the high threshold of Warning Event. Will be exposed as
   `tempY_crit` hwmon attribute (RO attribute). For example for
   ConnectX5 HCA's this temperature value will be 105 Celsius, 10
   degrees lower than the HW shutdown temperature).
5) Temperature reset history: resets highest temperature.

For example, for dualport ConnectX5 NIC with a single IC thermal diode
sensor will have 2 hwmon directories (one for each PCI function)
under "/sys/class/hwmon/hwmon[X,Y]".

Listing one of the directories above (hwmonX/Y) generates the
corresponding output below:

$ grep -H -d skip . /sys/class/hwmon/hwmon0/*

Output
=======================================================================
/sys/class/hwmon/hwmon0/name:mlx5
/sys/class/hwmon/hwmon0/temp1_crit:105000
/sys/class/hwmon/hwmon0/temp1_highest:48000
/sys/class/hwmon/hwmon0/temp1_input:46000
/sys/class/hwmon/hwmon0/temp1_label:asic
grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied

In addition, displaying the sensors data via lm_sensors generates the
corresponding output below:

$ sensors

Output
=======================================================================
mlx5-pci-0800
Adapter: PCI adapter
asic:         +46.0°C  (crit = +105.0°C, highest = +48.0°C)

mlx5-pci-0801
Adapter: PCI adapter
asic:         +46.0°C  (crit = +105.0°C, highest = +48.0°C)

CC: Jean Delvare <jdelvare@suse.com>
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09 15:52:16 -07:00
Leon Romanovsky 5726628127 net/mlx5: Add relevant capabilities bits to support NAT-T
Provide an ability to check if flow steering supports UDP
encapsulation and decapsulation of IPsec ESP packets.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-25 15:08:57 +02:00
Lama Kayal 9ee473c259 net/mlx5: Fix reserved at offset in hca_cap register
A member of struct mlx5_ifc_cmd_hca_cap_bits has been mistakenly
assigned the wrong reserved_at offset value. Correct it to align to the
right value, thus avoid future miscalculation.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-23 12:27:33 -07:00
Or Har-Toov 0bd2e6fc78 net/mlx5: Expose bits for local loopback counter
Add needed HW bits for querying local loopback counter and the
HCA capability for it.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16 12:02:08 -07:00
Moshe Shemesh 7a9770f1bf net/mlx5: Handle sync reset unload event
Added a new event handler to firmware sync reset, which is used to
support firmware sync reset flow on smart NIC. Adding this new stage to
the flow enables the firmware to ensure host PFs unload before ECPFs
unload, to avoid race of PFs recovery.

If firmware sends sync_reset_unload event to driver the driver should
unload and close all HW resources of the function. Once the driver
finishes unloading part, it can't get any more events from firmware as
event queues are closed, so it polls the reset state field to know when
to continue to next stage of the sync reset flow.

Added capability bit for supporting sync_reset_unload event.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16 12:02:07 -07:00
Moshe Shemesh 8bb42ed421 net/mlx5: Expose timeout for sync reset unload stage
Expose new timoueout in Default Timeouts Register to be used on sync
reset flow running on smart NIC. In this flow the driver should know how
much time to wait from getting unload request till firmware will ask the
PF to continue to next stage of the flow.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16 12:02:06 -07:00
Daniel Jurgens 93b36d0f28 net/mlx5: mlx5_ifc updates for embedded CPU SRIOV
Add ec_vf_vport_base to HCA Capabilities 2. This indicates the base vport
of embedded CPU virtual functions that are connected to the eswitch.

Add ec_vf_function to query/set_hca_caps. If set this indicates
accessing a virtual function on the embedded CPU by function ID. This
should only be used with other_function set to 1.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09 18:40:50 -07:00
Lama Kayal a33682e4e7 net/mlx5e: Expose catastrophic steering error counters
Add generated_pkt_steering_fail and handled_pkt_steering_fail to devlink
heatlth reporter.
generated_pkt_steering_fail indicates the number of packets dropped due to
illegal steering operation within the vport steering domain.
handled_pkt_steering_fail indicates the number of packets dropped due to
illegal steering operation, originated by the vport.

Also, update devlink reporter functionality documentation with the newly
exposed counters.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07 14:00:43 -07:00
Yevgeny Kliteynik c7dd225bc2 net/mlx5: DR, Check force-loopback RC QP capability independently from RoCE
SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB
(loopback), and FL (force-loopback) QP is preferred for performance. FL is
available when RoCE is enabled or disabled based on RoCE caps.
This patch adds reading of FL capability from HCA caps in addition to the
existing reading from RoCE caps, thus fixing the case where we didn't
have loopback enabled when RoCE was disabled.

Fixes: 7304d603a5 ("net/mlx5: DR, Add support for force-loopback QP")
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:05 -07:00
Linus Torvalds af3877265d v6.4 merge window RDMA pull request
Usual wide collection of unrelated items in drivers:
 
 - Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5, rxe,
   usnic, usnic, bnxt_re, ocrdma, iser
    * Unnecessary NULL checks
    * kmap obsolescence
    * pci_enable_pcie_error_reporting() obsolescence
    * Unused variables and macros
    * trace event related warnings
    * casting warnings
 
 - Code cleanups for irdm and erdma
 
 - EFA reporting of 128 byte PCIe TLP support
 
 - mlx5 more agressively uses the out of order HW feature
 
 - Big rework of how state machines and tasks work in rxe
 
 - Fix a syzkaller found crash netdev refcount leak in siw
 
 - bnxt_re revises their HW description header
 
 - Congestion control for bnxt_re
 
 - Use mmu_notifiers more safely in hfi1
 
 - mlx5 gets better support for PCIe relaxed ordering inside VMs
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZEva5wAKCRCFwuHvBreF
 YZFmAQC9T3b/XQ3bRknYciuzbatC98o9xB0FTqmEFYGj+Y2lVAD9EEVe3HKfHfi3
 t/GxXYB5r22oxg5bgsblZfEdEdTVCg8=
 =akMm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Usual wide collection of unrelated items in drivers:

   - Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5,
     rxe, usnic, usnic, bnxt_re, ocrdma, iser:
       - remove unnecessary NULL checks
       - kmap obsolescence
       - pci_enable_pcie_error_reporting() obsolescence
       - unused variables and macros
       - trace event related warnings
       - casting warnings

   - Code cleanups for irdm and erdma

   - EFA reporting of 128 byte PCIe TLP support

   - mlx5 more agressively uses the out of order HW feature

   - Big rework of how state machines and tasks work in rxe

   - Fix a syzkaller found crash netdev refcount leak in siw

   - bnxt_re revises their HW description header

   - Congestion control for bnxt_re

   - Use mmu_notifiers more safely in hfi1

   - mlx5 gets better support for PCIe relaxed ordering inside VMs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (81 commits)
  RDMA/efa: Add rdma write capability to device caps
  RDMA/mlx5: Use correct device num_ports when modify DC
  RDMA/irdma: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call
  RDMA/rxe: Fix spinlock recursion deadlock on requester
  RDMA/mlx5: Fix flow counter query via DEVX
  RDMA/rxe: Protect QP state with qp->state_lock
  RDMA/rxe: Move code to check if drained to subroutine
  RDMA/rxe: Remove qp->req.state
  RDMA/rxe: Remove qp->comp.state
  RDMA/rxe: Remove qp->resp.state
  RDMA/mlx5: Allow relaxed ordering read in VFs and VMs
  net/mlx5: Update relaxed ordering read HCA capabilities
  RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR
  RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write
  RDMA: Add ib_virt_dma_to_page()
  RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"
  RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame()
  RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c
  IB/hfi1: Place struct mmu_rb_handler on cache line start
  IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests
  ...
2023-04-29 17:21:24 -07:00
Roi Dayan f9c895a72a net/mlx5: Update op_mode to op_mod for port selection
To be consistent with the other enum keys use OP_MOD
instead of OP_MODE.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-20 18:35:50 -07:00
Mark Bloch 3e358ea861 RDMA/mlx5: Fix flow counter query via DEVX
Commit cited in "fixes" tag added bulk support for flow counters but it
didn't account that's also possible to query a counter using a non-base id
if the counter was allocated as bulk.

When a user performs a query, validate the flow counter id given in the
mailbox is inside the valid range taking bulk value into account.

Fixes: 208d70f562 ("IB/mlx5: Support flow counters offset for bulk counters")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://lore.kernel.org/r/79d7fbe291690128e44672418934256254d93115.1681377114.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-18 08:47:10 +03:00
Leon Romanovsky 1210af3b99 net/mlx5e: Add IPsec packet offload tunnel bits
Extend packet reformat types and flow table capabilities with
IPsec packet offload tunnel bits.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-17 18:55:25 -07:00
Avihai Horon ccbbfe0682 net/mlx5: Update relaxed ordering read HCA capabilities
Rename existing HCA capability relaxed_ordering_read to
relaxed_ordering_read_pci_enabled. This is in accordance with recent PRM
change to better describe the capability, as it's set only if both the
device supports relaxed ordering (RO) read and RO is enabled in PCI
config space.

In addition, add new HCA capability relaxed_ordering_read which is set
if the device supports RO read, regardless of RO in PCI config space.
This will be used in the following patch to allow RO in VFs and VMs.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Link: https://lore.kernel.org/r/caa0002fd8135086357dfcc368e2f5cc73b08480.1681131553.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16 13:29:19 +03:00
Yevgeny Kliteynik 9fa7f1de3d net/mlx5: Add mlx5_ifc bits for modify header argument
Add enum value for modify-header argument object and mlx5_bits
for the related capabilities.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:37 -07:00
Vlad Buslov e5688f6fb9 net/mlx5: Add mlx5_ifc definitions for bridge multicast support
Add the required hardware definitions to mlx5_ifc: fdb_uplink_hairpin,
fdb_multi_path_any_table_limit_regc, fdb_multi_path_any_table.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-11 20:57:35 -07:00
Patrisious Haddad 77f7eb9f34 net/mlx5: Introduce other vport query for Q-counters
These new fields in QUERY_Q_COUNTER command allow us to access
another vport counters during the query command, which is specially
useful to query representor vports.

In addition also add the required caps to check if this capability
is actually supported.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/75c73a4a0e60f18c37b35a4a11ca2e2415e4a6f3.1679566038.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29 10:08:40 +03:00
Or Har-Toov 6e2a3a324a net/mlx5: Expose bits for enabling out-of-order by default
Add needed HW bits for enabling out-of-order by default and
use go_back_n when out-of-order is not needed.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/75d6dfe263989a05c08c43406132b336ea12d00a.1679230449.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-23 10:21:13 +02:00
Gavin Li 6ee44c5181 net/mlx5e: TC, Add support for VxLAN GBP encap/decap flows offload
Add HW offloading support for TC flows with VxLAN GBP encap/decap.

Example of encap rule:
tc filter add dev eth0 protocol ip ingress flower \
    action tunnel_key set id 42 vxlan_opts 512 \
    action mirred egress redirect dev vxlan1

Example of decap rule:
tc filter add dev vxlan1 protocol ip ingress flower \
    enc_key_id 42 enc_dst_port 4789 vxlan_opts 1024 \
    action tunnel_key unset action mirred egress redirect dev eth0

Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-17 22:41:16 -07:00
Sandipan Patra c1fef618d6 net/mlx5: Implement thermal zone
Implement thermal zone support for mlx5 based HW. The NIC
uses temperature sensor provided by ASIC to report current temperature
to thermal core.

Signed-off-by: Sandipan Patra <spatra@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15 22:09:14 -07:00
Linus Torvalds 8cbd92339d v6.3 RDMA pull request
Small cycle this time:
 
 - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana
 
 - inline CQE support for hns
 
 - Have mlx5 display device error codes
 
 - Pinned DMABUF support for irdma
 
 - Continued rxe cleanups, particularly converting the MRs to use xarray
 
 - Improvements to what can be cached in the mlx5 mkey cache
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/gPmgAKCRCFwuHvBreF
 YW5IAP4xOAiTif4f87vD1twRU/ebq4VEX0r+C2NX5x5fwlCJrAEA7RLV8uG9Uii2
 ez0BuWNxfajuvFHntnZ1E+7UDP0S8gk=
 =CgUH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Quite a small cycle this time, even with the rc8. I suppose everyone
  went to sleep over xmas.

   - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw,
     mana

   - inline CQE support for hns

   - Have mlx5 display device error codes

   - Pinned DMABUF support for irdma

   - Continued rxe cleanups, particularly converting the MRs to use
     xarray

   - Improvements to what can be cached in the mlx5 mkey cache"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits)
  IB/mlx5: Extend debug control for CC parameters
  IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors
  IB/hfi1: Fix math bugs in hfi1_can_pin_pages()
  RDMA/irdma: Add support for dmabuf pin memory regions
  RDMA/mlx5: Use query_special_contexts for mkeys
  net/mlx5e: Use query_special_contexts for mkeys
  net/mlx5: Change define name for 0x100 lkey value
  net/mlx5: Expose bits for querying special mkeys
  RDMA/rxe: Fix missing memory barriers in rxe_queue.h
  RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet
  RDMA/rxe: Remove rxe_alloc()
  RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size
  Subject: RDMA/rxe: Handle zero length rdma
  iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry()
  RDMA/mlx5: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Remove unused 'work' member from struct ib_umem
  RDMA/irdma: Cap MSIX used to online CPUs + 1
  RDMA/mlx5: Check reg_create() create for errors
  RDMA/restrack: Correct spelling
  RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish()
  ...
2023-02-24 15:11:03 -08:00
Edward Srouji 66fb1d5df6 IB/mlx5: Extend debug control for CC parameters
This patch adds rtt_resp_dscp to the current debug controllability of
congestion control (CC) parameters.
rtt_resp_dscp can be read or written through debugfs.
If set, its value overwrites the DSCP of the generated RTT response.

Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://lore.kernel.org/r/1dcc3440ee53c688f19f579a051ded81a2aaa70a.1676538714.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-19 11:50:59 +02:00
Or Har-Toov 4b7296aa6c net/mlx5: Expose bits for querying special mkeys
Add needed HW bits to query the values of all special mkeys.

Link: https://lore.kernel.org/r/080ebb563a9717c15b1ea75d669aede676df386b.1673960981.git.leon@kernel.org
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17 16:22:22 -04:00
Jakub Kicinski 84cb1b53cd Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:

====================
mlx5-next changes

Following previous conversations [1] and our clear commitment to do
the TC work [2], please pull mlx5-next shared branch, which includes
low-level steering logic to allow RoCEv2 traffic to be encrypted/
decrypted through IPsec.

[1] https://lore.kernel.org/all/20230126230815.224239-1-saeed@kernel.org/
[2] https://lore.kernel.org/all/Y+Z7lVVWqnRBiPh2@nvidia.com/

* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Configure IPsec steering for egress RoCEv2 traffic
  net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic
  net/mlx5: Add IPSec priorities in RDMA namespaces
  net/mlx5: Implement new destination type TABLE_TYPE
  net/mlx5: Introduce new destination type TABLE_TYPE
====================

Link: https://lore.kernel.org/r/20230215095624.1365200-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-16 11:36:14 -08:00
Patrisious Haddad 7368f221e0 net/mlx5: Introduce new destination type TABLE_TYPE
This new destination type supports flow transition between different
table types, e.g. from NIC_RX to RDMA_RX or from RDMA_TX to NIC_TX.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-15 11:29:43 +02:00
Jakub Kicinski 9245b518c8 mlx5-next-netdev-deadlock
This series from Jiri solves a deadlock when removing a network namespace
 with mlx5 devlink instance being in it.
 The deadlock is between:
 1) mlx5_ib->unregister_netdevice_notifier()
 AND
 2) mlx5_core->devlink_reload->cleanup_net()
 
 To slove this introduced mlx5 netdev added/removed events to track uplink
 netdev to be used for register_netdevice_notifier_dev_net() purposes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmPkeZAACgkQSD+KveBX
 +j6Digf/fTtMmV2I2GwKQJCza4+MAP8Nt9tKInj3x02AoNVXNwHupL72HWZiaKnB
 YGvPAwjDvxPy2Ok1BsHJLyEOTZpZse8QtS/Sjzk00lovtOYzCwLdJfBrNnVRS5KV
 Cz/dNtlQcpsAoErFSfmvraLhn7tMNrHMTDahzaNalDkO3wZYXUh+2VDwnXErQy+3
 1HI9m2pGy8hQ3sNQTNhqcyY4mp1Qw3nTVIkE8c9E5TJcawVkk4xqlgQuT43nqcn5
 H+CTXJTFyUMNkF8kNPTMvMoYfTYWhBqbZKuf+YDyQKwdf5IZyc1kuRIaqJNs5VjU
 mUtwKHMk5apKLbE8rvmZlg/+geTlJA==
 =Aqcc
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-next-netdev-deadlock' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-next-netdev-deadlock

This series from Jiri solves a deadlock when removing a network namespace
with mlx5 devlink instance being in it.
The deadlock is between:
1) mlx5_ib->unregister_netdevice_notifier()
AND
2) mlx5_core->devlink_reload->cleanup_net()

To slove this introduced mlx5 netdev added/removed events to track uplink
netdev to be used for register_netdevice_notifier_dev_net() purposes.

* tag 'mlx5-next-netdev-deadlock' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister
  net/mlx5e: Propagate an internal event in case uplink netdev changes
  net/mlx5e: Fix trap event handling
  net/mlx5: Introduce CQE error syndrome
====================

Link: https://lore.kernel.org/r/20230208005626.72930-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08 21:01:17 -08:00
Rahul Rameshbabu b63636b6c1 net/mlx5: Add firmware support for MTUTC scaled_ppm frequency adjustments
When device is capable of handling scaled ppm values for adjusting
frequency, conversion to ppb will not be done by the driver. Instead, the
scaled ppm value will be passed directly to the device for the frequency
adjustment operation.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-04 02:07:03 -08:00
Jianbo Liu 60c8972d2c net/mlx5: Change key type to key purpose
Change the naming of key type in DEK fields and macros, to be
consistent with the device spec.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-30 19:10:04 -08:00
Jianbo Liu 9a0ed4f2bf net/mlx5: Add IFC bits and enums for crypto key
Add and extend structure layouts and defines for fast crypto key
update. This is a prerequisite to support bulk creation, key
modification and destruction, software wrapped DEK, and SYNC_CRYPTO
command.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-30 19:10:04 -08:00
Jianbo Liu 4744c7ad22 net/mlx5: Add IFC bits for general obj create param
Before this patch, the log_obj_range was defined inside
general_obj_in_cmd_hdr to support bulk allocation. However, we need to
modify/query one of the object in the bulk in later patch, so change
those fields to param bits for parameters specific for cmd header, and
add general_obj_create_param according to what was updated in spec.
We will also add general_obj_query_param for modify/query later.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-30 19:10:04 -08:00
Rahul Rameshbabu d3c8a33a5c net/mlx5: Add hardware extended range support for PTP adjtime and adjphase
Capable hardware can use an extended range for offsetting the clock. An
extended range of [-200000,200000] is used instead of [-32768,32767] for
the delta/phase parameter of the adjtime/adjphase ptp_clock_info callbacks.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-18 10:34:07 -08:00
Patrisious Haddad 9b2e372372 net/mlx5: Introduce CQE error syndrome
Introduces CQE error syndrome bits which are inside qp_context_extension
and are used to report the reason the QP was moved to error state.
Useful for cases in which a CQE isn't generated, such as remote write
rkey violation.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/f8359315f8130f6d2abe4b94409ac7802f54bce3.1672821186.git.leonro@nvidia.com
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-15 12:17:50 +02:00
Maher Sanalla 8d231dbc3b net/mlx5: Expose shared buffer registers bits and structs
Add the shared receive buffer management and configuration registers:
1. SBPR - Shared Buffer Pools Register
2. SBCM - Shared Buffer Class Management Register

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-10 21:24:39 -08:00
Moshe Shemesh 1f0ae22ab4 net/mlx5: E-Switch, properly handle ingress tagged packets on VST
Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
present in the frame. Previous VST mode behavior was to drop packets or
override existing tag, depending on the device version.

In this patch we fix this behavior by correctly building the HW steering
rule with a push vlan action, or for older devices we ask the FW to stack
the vlan when a vlan is already present.

Fixes: 07bab95026 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: dfcb1ed3c3 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28 11:38:49 -08:00
Linus Torvalds 785d21ba2f VFIO updates for v6.2-rc1
- Replace deprecated git://github.com link in MAINTAINERS. (Palmer Dabbelt)
 
  - Simplify vfio/mlx5 with module_pci_driver() helper. (Shang XiaoJing)
 
  - Drop unnecessary buffer from ACPI call. (Rafael Mendonca)
 
  - Correct latent missing include issue in iova-bitmap and fix support
    for unaligned bitmaps.  Follow-up with better fix through refactor.
    (Joao Martins)
 
  - Rework ccw mdev driver to split private data from parent structure,
    better aligning with the mdev lifecycle and allowing us to remove
    a temporary workaround. (Eric Farman)
 
  - Add an interface to get an estimated migration data size for a device,
    allowing userspace to make informed decisions, ex. more accurately
    predicting VM downtime. (Yishai Hadas)
 
  - Fix minor typo in vfio/mlx5 array declaration. (Yishai Hadas)
 
  - Simplify module and Kconfig through consolidating SPAPR/EEH code and
    config options and folding virqfd module into main vfio module.
    (Jason Gunthorpe)
 
  - Fix error path from device_register() across all vfio mdev and sample
    drivers. (Alex Williamson)
 
  - Define migration pre-copy interface and implement for vfio/mlx5
    devices, allowing portions of the device state to be saved while the
    device continues operation, towards reducing the stop-copy state
    size. (Jason Gunthorpe, Yishai Hadas, Shay Drory)
 
  - Implement pre-copy for hisi_acc devices. (Shameer Kolothum)
 
  - Fixes to mdpy mdev driver remove path and error path on probe.
    (Shang XiaoJing)
 
  - vfio/mlx5 fixes for incorrect return after copy_to_user() fault and
    incorrect buffer freeing. (Dan Carpenter)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmObfPgbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiDogP/i9GuBKposvZpnfxXWwo
 oNpKBZSOVMW8wgavNEuryMb+9WoouIghce8XU49MmONoP26kIh5TA14Zpi3XWkLK
 K+NlpwicESvLeZVHU7f3R8meVqmPtlxIi59jE+CfEHB8BW2HIAsEdwdhkxMwus9C
 nuiiK/2YYyQWOXYc4LAIkspMzjtGPy6Im5P6AED+dI+TFCEqJAM5qgOLJZFlk4a/
 WwZY2xjVKOl6xf5VZXGw+v7fDgz2Ju+j4Bm3X5lx1HgiDrEH83MjXY5h67neAIVb
 bXrfNLN++MiuO5niGTFMbUjGVUIFxsfmJzBnL9QrLsuj0JrGEKsu/1JEO78g0Km0
 ZCChoJ6UyUOgxt6evEymUAZAAkbcKaaht2gdbAXW71tv9p1TripAbBKwVeah1bQp
 SiHPqy9InKJlhaf+GbXL9eux1WVMfQ6FZccU16bNt7VaV2I8js85z/2gqVD0a5Mw
 +gnwp5XMUFWNKlJrnc7uVCD0bDExwQhr75OP4rWjMNvvLi9hPXJ2cI2Sg+9OLzQw
 vm/I+Df+FfXCuGAgX4Lxq76pqWlYGJH0Qxc14Ds6YoXqygBPz9yvTtuBv8mTHJzE
 KdAl/6DmZZxZ/JFD9lPF80KRiAsJ6iNf6tPTWES7hfDBfIdgQ/DZbXridLWJPNoi
 xLfaW19yrLTXWKSmR7G2Lsz4
 =q9xs
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Replace deprecated git://github.com link in MAINTAINERS (Palmer
   Dabbelt)

 - Simplify vfio/mlx5 with module_pci_driver() helper (Shang XiaoJing)

 - Drop unnecessary buffer from ACPI call (Rafael Mendonca)

 - Correct latent missing include issue in iova-bitmap and fix support
   for unaligned bitmaps. Follow-up with better fix through refactor
   (Joao Martins)

 - Rework ccw mdev driver to split private data from parent structure,
   better aligning with the mdev lifecycle and allowing us to remove a
   temporary workaround (Eric Farman)

 - Add an interface to get an estimated migration data size for a
   device, allowing userspace to make informed decisions, ex. more
   accurately predicting VM downtime (Yishai Hadas)

 - Fix minor typo in vfio/mlx5 array declaration (Yishai Hadas)

 - Simplify module and Kconfig through consolidating SPAPR/EEH code and
   config options and folding virqfd module into main vfio module (Jason
   Gunthorpe)

 - Fix error path from device_register() across all vfio mdev and sample
   drivers (Alex Williamson)

 - Define migration pre-copy interface and implement for vfio/mlx5
   devices, allowing portions of the device state to be saved while the
   device continues operation, towards reducing the stop-copy state size
   (Jason Gunthorpe, Yishai Hadas, Shay Drory)

 - Implement pre-copy for hisi_acc devices (Shameer Kolothum)

 - Fixes to mdpy mdev driver remove path and error path on probe (Shang
   XiaoJing)

 - vfio/mlx5 fixes for incorrect return after copy_to_user() fault and
   incorrect buffer freeing (Dan Carpenter)

* tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfio: (42 commits)
  vfio/mlx5: error pointer dereference in error handling
  vfio/mlx5: fix error code in mlx5vf_precopy_ioctl()
  samples: vfio-mdev: Fix missing pci_disable_device() in mdpy_fb_probe()
  hisi_acc_vfio_pci: Enable PRE_COPY flag
  hisi_acc_vfio_pci: Move the dev compatibility tests for early check
  hisi_acc_vfio_pci: Introduce support for PRE_COPY state transitions
  hisi_acc_vfio_pci: Add support for precopy IOCTL
  vfio/mlx5: Enable MIGRATION_PRE_COPY flag
  vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error
  vfio/mlx5: Introduce multiple loads
  vfio/mlx5: Consider temporary end of stream as part of PRE_COPY
  vfio/mlx5: Introduce vfio precopy ioctl implementation
  vfio/mlx5: Introduce SW headers for migration states
  vfio/mlx5: Introduce device transitions of PRE_COPY
  vfio/mlx5: Refactor to use queue based data chunks
  vfio/mlx5: Refactor migration file state
  vfio/mlx5: Refactor MKEY usage
  vfio/mlx5: Refactor PD usage
  vfio/mlx5: Enforce a single SAVE command at a time
  vfio: Extend the device migration protocol with PRE_COPY
  ...
2022-12-15 13:12:15 -08:00
Jakub Kicinski dd8b3a802b ipsec-next-2022-12-09
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmOS/ooACgkQrB3Eaf9P
 W7cOVA/+L8rHwLe78DDz/PNESyShtTVCYBDF/ngYMV8AIvjSfPresMbFV3NKqO5E
 3qbMl199QH2eWI7dhQaQ+edynSG0QCx5FmPai0UuHPLxATct1pNPJPpvBryO/4jC
 ZouYBIVjdMbq6Y8vD2gJ8UtA7TZpncP0HYOKTvYyDL9kQ+nUmu9KUYxcEcNHL5w+
 TjL9jJafR+GqczCRiwAoMKIFV7lUrTFzh7slfINNN5DVTuzN33H7Tp70z6IKOfVL
 1LATlZv7mqpLVF6dQuMXOt6kd/BEBl1y4ZHTHow5nstJvwu99P96iKwEfIXuOvWK
 fulhDU61eIik8D9QJWeM7TuZDbYewWI77plwVY/R/zRt0At4VLpq7I1m33CmLLMY
 Fb5fMxJPkM8YAtDID+BknYPrSAcxo8ji04BWFrVqQ6InPmtGfnP83XSSkYfxY7FB
 3hUfz4igsJpV5vrS1EFRhjklNwI+jY2yAvIggQtdkJ97ubSUY3E4ACfNqlJ5lJbv
 2KqWnSKlG21F9ZTR68VzcQVhFIQF6j/EuQqro+4TQUIdZswcml2iK32zrel0rs9C
 iAsgQQaMV9a2vEaScRZqdOJ4HENTbm9wD7Mso/i5vr+lnpr1ThKjQo8osU8YUlbC
 SDTMeWRRos+esFML6SP+YZ7SM/qXMluou204x/llJ/VDMXQ5e8k=
 =enQp
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
ipsec-next 2022-12-09

1) Add xfrm packet offload core API.
   From Leon Romanovsky.

2) Add xfrm packet offload support for mlx5.
   From Leon Romanovsky and Raed Salem.

3) Fix a typto in a error message.
   From Colin Ian King.

* tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: (38 commits)
  xfrm: Fix spelling mistake "oflload" -> "offload"
  net/mlx5e: Open mlx5 driver to accept IPsec packet offload
  net/mlx5e: Handle ESN update events
  net/mlx5e: Handle hardware IPsec limits events
  net/mlx5e: Update IPsec soft and hard limits
  net/mlx5e: Store all XFRM SAs in Xarray
  net/mlx5e: Provide intermediate pointer to access IPsec struct
  net/mlx5e: Skip IPsec encryption for TX path without matching policy
  net/mlx5e: Add statistics for Rx/Tx IPsec offloaded flows
  net/mlx5e: Improve IPsec flow steering autogroup
  net/mlx5e: Configure IPsec packet offload flow steering
  net/mlx5e: Use same coding pattern for Rx and Tx flows
  net/mlx5e: Add XFRM policy offload logic
  net/mlx5e: Create IPsec policy offload tables
  net/mlx5e: Generalize creation of default IPsec miss group and rule
  net/mlx5e: Group IPsec miss handles into separate struct
  net/mlx5e: Make clear what IPsec rx_err does
  net/mlx5e: Flatten the IPsec RX add rule path
  net/mlx5e: Refactor FTE setup code to be more clear
  net/mlx5e: Move IPsec flow table creation to separate function
  ...
====================

Link: https://lore.kernel.org/r/20221209093310.4018731-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-09 20:06:35 -08:00
Yevgeny Kliteynik f1543c7aba net/mlx5: mlx5_ifc updates for MATCH_DEFINER general object
Update full structure of match definer and add an ID of
the SELECT match definer type.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08 16:10:53 -08:00
Yishai Hadas df268f6ca7 net/mlx5: Introduce IFC bits for migratable
Introduce IFC related capabilities to enable setting VF to be able to
perform live migration. e.g.: to be migratable.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-07 20:09:17 -08:00
Shay Drory c943a9374d net/mlx5: Introduce ifc bits for pre_copy
Introduce ifc related stuff to enable PRE_COPY of VF during migration.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20221206083438.37807-2-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06 12:36:43 -07:00
Leon Romanovsky 3afee4ed33 net/mlx5: Add HW definitions for IPsec packet offload
Add all needed bits to support IPsec packet offload mode.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-06 13:54:04 +01:00
Linus Torvalds e08466a7c0 v6.1 merge window pull request
- Small bug fixes in mlx5, efa, rxe, hns, irdma, erdma, siw
 
 - rts tracing improvements
 
 - Code improvements: strlscpy conversion, unused parameter, spelling
   mistakes, unused variables, flex arrays
 
 - restrack device details report for hns
 
 - Simplify struct device initialization in SRP
 
 - Eliminate the never-used service_mask support in IB CM
 
 - Make rxe not print to the console for some kinds of network packets
 
 - Asymetric paths and router support in the CM through netlink messages
 
 - DMABUF importer support for mlx5devx umem's
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCYz9bgAAKCRCFwuHvBreF
 YevoAP47J/svlOFlFtBhTVF79Ddtf+MMeqeVvLoHHQbCU5rUpAD+KUpTXAvwNcM9
 dHwNXz9ctanP5397qusH0rxOKPo/EA4=
 =lgSv
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Not a big list of changes this cycle, mostly small things. The new
  MANA rdma driver should come next cycle along with a bunch of work on
  rxe.

  Summary:

   - Small bug fixes in mlx5, efa, rxe, hns, irdma, erdma, siw

   - rts tracing improvements

   - Code improvements: strlscpy conversion, unused parameter, spelling
     mistakes, unused variables, flex arrays

   - restrack device details report for hns

   - Simplify struct device initialization in SRP

   - Eliminate the never-used service_mask support in IB CM

   - Make rxe not print to the console for some kinds of network packets

   - Asymetric paths and router support in the CM through netlink
     messages

   - DMABUF importer support for mlx5devx umem's"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (84 commits)
  RDMA/rxe: Remove error/warning messages from packet receiver path
  RDMA/usnic: fix set-but-not-unused variable 'flags' warning
  IB/hfi1: Use skb_put_data() instead of skb_put/memcpy pair
  RDMA/hns: Unified Log Printing Style
  RDMA/hns: Replacing magic number with macros in apply_func_caps()
  RDMA/hns: Repacing 'dseg_len' by macros in fill_ext_sge_inl_data()
  RDMA/hns: Remove redundant 'max_srq_desc_sz' in caps
  RDMA/hns: Remove redundant 'num_mtt_segs' and 'max_extend_sg'
  RDMA/hns: Remove redundant 'phy_addr' in hns_roce_hem_list_find_mtt()
  RDMA/hns: Remove redundant 'use_lowmem' argument from hns_roce_init_hem_table()
  RDMA/hns: Remove redundant 'bt_level' for hem_list_alloc_item()
  RDMA/hns: Remove redundant 'attr_mask' in modify_qp_init_to_init()
  RDMA/hns: Remove unnecessary brackets when getting point
  RDMA/hns: Remove unnecessary braces for single statement blocks
  RDMA/hns: Cleanup for a spelling error of Asynchronous
  IB/rdmavt: Add __init/__exit annotations to module init/exit funcs
  RDMA/rxe: Remove redundant num_sge fields
  RDMA/mlx5: Enable ATS support for MRs and umems
  RDMA/mlx5: Add support for dmabuf to devx umem
  RDMA/core: Add UVERBS_ATTR_RAW_FD
  ...
2022-10-07 12:05:29 -07:00
Gal Pressman 16ab85e784 net/mlx5e: Expose rx_oversize_pkts_buffer counter
Add the rx_oversize_pkts_buffer counter to ethtool statistics.
This counter exposes the number of dropped received packets due to
length which arrived to RQ and exceed software buffer size allocated by
the device for incoming traffic. It might imply that the device MTU is
larger than the software buffers size.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03 16:55:29 -07:00
Maxim Mikityanskiy 40b72108f9 net/mlx5: Add the log_min_mkey_entity_size capability
Add the capability that will allow the driver to determine the minimal
MTT page size to be able to map the smallest possible pages in XSK. The
older firmwares that don't have this capability default to 12 (i.e.
4096-byte pages).

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28 19:36:33 -07:00
Jakub Kicinski 0d5bfebf74 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
updates from mlx5-next 2022-09-24

Updates form mlx5-next including[1]:

1) HW definitions and support for NPPS clock settings.

2) various cleanups

3) Enable hash mode by default for all NICs

4) page tracker and advanced virtualization HW definitions for vfio

[1] https://lore.kernel.org/netdev/20220907233636.388475-1-saeed@kernel.org/

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Remove from FPGA IFC file not-needed definitions
  net/mlx5: Remove unused structs
  net/mlx5: Remove unused functions
  net/mlx5: detect and enable bypass port select flow table
  net/mlx5: Lag, enable hash mode by default for all NICs
  net/mlx5: Lag, set active ports if support bypass port select flow table
  RDMA/mlx5: Don't set tx affinity when lag is in hash mode
  net/mlx5: add IFC bits for bypassing port select flow table
  net/mlx5: Add support for NPPS with real time mode
  net/mlx5: Expose NPPS related registers
  net/mlx5: Query ADV_VIRTUALIZATION capabilities
  net/mlx5: Introduce ifc bits for page tracker
  RDMA/mlx5: Move function mlx5_core_query_ib_ppcnt() to mlx5_ib
====================

Link: https://lore.kernel.org/all/20220927201906.234015-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28 19:20:49 -07:00
Leon Romanovsky 9175d81037 net/mlx5: Remove from FPGA IFC file not-needed definitions
Move IP layout bits definitions to be close to the place that actually
uses it, together with removal extra defines that not in-use.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-09-27 12:50:27 -07:00
Liu, Changcheng 8d1ac895ff net/mlx5: add IFC bits for bypassing port select flow table
port_select_flow_table_bypass - When set, device supports
bypass port select flow table.
active_port - Bitmask indicates the current active ports
in PORT_SELECT_FT LAG.
MLX5_SET_HCA_CAP_OP_MODE_PORT_SELECTION - op_mod to operate
PORT_SELECTION_Capabilities.

Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-09-27 12:50:27 -07:00
Aya Levin 976a859c9c net/mlx5: Expose NPPS related registers
Add management capability bits indicating firmware may support N pulses
per second. Add corresponding fields in MTPPS register.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-09-27 12:50:26 -07:00
Jason Gunthorpe 4bf207d7a5 net/mlx5: Add IFC bits for mkey ATS
Allows telling a mkey to use PCI ATS for DMA that flows through it.

Link: https://lore.kernel.org/r/1-v1-bd147097458e+ede-umem_dmabuf_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27 10:15:24 -03:00
Emeel Hakim 23cc83c6ca net/mlx5: Add ifc bits for MACsec extended packet number (EPN) and replay protection
Add ifc bits related to advanced steering operations (ASO) and general
object modify for macsec to use as part of offloading EPN and replay
protection features.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 18:01:32 -07:00
Emeel Hakim 21803630c4 net/mlx5: Fix fields name prefix in MACsec
Fix ifc fields name to be consistent with the device spec document.

Fixes: 8385c51ff5 ("net/mlx5: Introduce MACsec Connect-X offload hardware bits and structures")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 18:01:32 -07:00