Merge branch 'for-6.15/io_uring-rx-zc' into for-6.15/io_uring-reg-vec

* for-6.15/io_uring-rx-zc: (80 commits)
  io_uring/zcrx: add selftest case for recvzc with read limit
  io_uring/zcrx: add a read limit to recvzc requests
  io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
  io_uring: Rename KConfig to Kconfig
  io_uring/zcrx: fix leaks on failed registration
  io_uring/zcrx: recheck ifq on shutdown
  io_uring/zcrx: add selftest
  net: add documentation for io_uring zcrx
  io_uring/zcrx: add copy fallback
  io_uring/zcrx: throttle receive requests
  io_uring/zcrx: set pp memory provider for an rx queue
  io_uring/zcrx: add io_recvzc request
  io_uring/zcrx: dma-map area for the device
  io_uring/zcrx: implement zerocopy receive pp memory provider
  io_uring/zcrx: grab a net device
  io_uring/zcrx: add io_zcrx_area
  io_uring/zcrx: add interface queue and refill queue
  net: add helpers for setting a memory provider on an rx queue
  net: page_pool: add memory provider helpers
  net: prepare for non devmem TCP memory providers
  ...
pull/1188/head
Jens Axboe 2025-03-07 09:07:11 -07:00
commit 78b6f6e9bf
107 changed files with 3558 additions and 3361 deletions

View File

@ -244,7 +244,7 @@ information about the interrupt from the irb parameter.
--------------------
The ccwgroup mechanism is designed to handle devices consisting of multiple ccw
devices, like lcs or ctc.
devices, like qeth or ctc.
The ccw driver provides a 'group' attribute. Piping bus ids of ccw devices to
this attributes creates a ccwgroup device consisting of these ccw devices (if

View File

@ -44,6 +44,9 @@ properties:
phy-mode:
enum:
- rgmii
- rgmii-id
- rgmii-rxid
- rgmii-txid
- rmii
phy-handle: true

View File

@ -14,9 +14,10 @@ $defs:
pattern: ^[0-9A-Za-z_-]+( - 1)?$
minimum: 0
len-or-limit:
# literal int or limit based on fixed-width type e.g. u8-min, u16-max, etc.
# literal int, const name, or limit based on fixed-width type
# e.g. u8-min, u16-max, etc.
type: [ string, integer ]
pattern: ^[su](8|16|32|64)-(min|max)$
pattern: ^[0-9A-Za-z_-]+$
minimum: 0
# Schema for specs

View File

@ -14,9 +14,10 @@ $defs:
pattern: ^[0-9A-Za-z_-]+( - 1)?$
minimum: 0
len-or-limit:
# literal int or limit based on fixed-width type e.g. u8-min, u16-max, etc.
# literal int, const name, or limit based on fixed-width type
# e.g. u8-min, u16-max, etc.
type: [ string, integer ]
pattern: ^[su](8|16|32|64)-(min|max)$
pattern: ^[0-9A-Za-z_-]+$
minimum: 0
# Schema for specs

View File

@ -14,9 +14,10 @@ $defs:
pattern: ^[0-9A-Za-z_-]+( - 1)?$
minimum: 0
len-or-limit:
# literal int or limit based on fixed-width type e.g. u8-min, u16-max, etc.
# literal int, const name, or limit based on fixed-width type
# e.g. u8-min, u16-max, etc.
type: [ string, integer ]
pattern: ^[su](8|16|32|64)-(min|max)$
pattern: ^[0-9A-Za-z_-]+$
minimum: 0
# Schema for specs

View File

@ -114,6 +114,9 @@ attribute-sets:
doc: Bitmask of enabled AF_XDP features.
type: u64
enum: xsk-flags
-
name: io-uring-provider-info
attributes: []
-
name: page-pool
attributes:
@ -171,6 +174,11 @@ attribute-sets:
name: dmabuf
doc: ID of the dmabuf this page-pool is attached to.
type: u32
-
name: io-uring
doc: io-uring memory provider information.
type: nest
nested-attributes: io-uring-provider-info
-
name: page-pool-info
subset-of: page-pool
@ -296,6 +304,11 @@ attribute-sets:
name: dmabuf
doc: ID of the dmabuf attached to this queue, if any.
type: u32
-
name: io-uring
doc: io_uring memory provider information.
type: nest
nested-attributes: io-uring-provider-info
-
name: qstats
@ -572,6 +585,7 @@ operations:
- inflight-mem
- detach-time
- dmabuf
- io-uring
dump:
reply: *pp-reply
config-cond: page-pool
@ -637,6 +651,7 @@ operations:
- napi-id
- ifindex
- dmabuf
- io-uring
dump:
request:
attributes:

View File

@ -63,6 +63,7 @@ Contents:
gtp
ila
ioam6-sysctl
iou-zcrx
ip_dynaddr
ipsec
ip-sysctl

View File

@ -0,0 +1,202 @@
.. SPDX-License-Identifier: GPL-2.0
=====================
io_uring zero copy Rx
=====================
Introduction
============
io_uring zero copy Rx (ZC Rx) is a feature that removes kernel-to-user copy on
the network receive path, allowing packet data to be received directly into
userspace memory. This feature is different to TCP_ZEROCOPY_RECEIVE in that
there are no strict alignment requirements and no need to mmap()/munmap().
Compared to kernel bypass solutions such as e.g. DPDK, the packet headers are
processed by the kernel TCP stack as normal.
NIC HW Requirements
===================
Several NIC HW features are required for io_uring ZC Rx to work. For now the
kernel API does not configure the NIC and it must be done by the user.
Header/data split
-----------------
Required to split packets at the L4 boundary into a header and a payload.
Headers are received into kernel memory as normal and processed by the TCP
stack as normal. Payloads are received into userspace memory directly.
Flow steering
-------------
Specific HW Rx queues are configured for this feature, but modern NICs
typically distribute flows across all HW Rx queues. Flow steering is required
to ensure that only desired flows are directed towards HW queues that are
configured for io_uring ZC Rx.
RSS
---
In addition to flow steering above, RSS is required to steer all other non-zero
copy flows away from queues that are configured for io_uring ZC Rx.
Usage
=====
Setup NIC
---------
Must be done out of band for now.
Ensure there are at least two queues::
ethtool -L eth0 combined 2
Enable header/data split::
ethtool -G eth0 tcp-data-split on
Carve out half of the HW Rx queues for zero copy using RSS::
ethtool -X eth0 equal 1
Set up flow steering, bearing in mind that queues are 0-indexed::
ethtool -N eth0 flow-type tcp6 ... action 1
Setup io_uring
--------------
This section describes the low level io_uring kernel API. Please refer to
liburing documentation for how to use the higher level API.
Create an io_uring instance with the following required setup flags::
IORING_SETUP_SINGLE_ISSUER
IORING_SETUP_DEFER_TASKRUN
IORING_SETUP_CQE32
Create memory area
------------------
Allocate userspace memory area for receiving zero copy data::
void *area_ptr = mmap(NULL, area_size,
PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE,
0, 0);
Create refill ring
------------------
Allocate memory for a shared ringbuf used for returning consumed buffers::
void *ring_ptr = mmap(NULL, ring_size,
PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE,
0, 0);
This refill ring consists of some space for the header, followed by an array of
``struct io_uring_zcrx_rqe``::
size_t rq_entries = 4096;
size_t ring_size = rq_entries * sizeof(struct io_uring_zcrx_rqe) + PAGE_SIZE;
/* align to page size */
ring_size = (ring_size + (PAGE_SIZE - 1)) & ~(PAGE_SIZE - 1);
Register ZC Rx
--------------
Fill in registration structs::
struct io_uring_zcrx_area_reg area_reg = {
.addr = (__u64)(unsigned long)area_ptr,
.len = area_size,
.flags = 0,
};
struct io_uring_region_desc region_reg = {
.user_addr = (__u64)(unsigned long)ring_ptr,
.size = ring_size,
.flags = IORING_MEM_REGION_TYPE_USER,
};
struct io_uring_zcrx_ifq_reg reg = {
.if_idx = if_nametoindex("eth0"),
/* this is the HW queue with desired flow steered into it */
.if_rxq = 1,
.rq_entries = rq_entries,
.area_ptr = (__u64)(unsigned long)&area_reg,
.region_ptr = (__u64)(unsigned long)&region_reg,
};
Register with kernel::
io_uring_register_ifq(ring, &reg);
Map refill ring
---------------
The kernel fills in fields for the refill ring in the registration ``struct
io_uring_zcrx_ifq_reg``. Map it into userspace::
struct io_uring_zcrx_rq refill_ring;
refill_ring.khead = (unsigned *)((char *)ring_ptr + reg.offsets.head);
refill_ring.khead = (unsigned *)((char *)ring_ptr + reg.offsets.tail);
refill_ring.rqes =
(struct io_uring_zcrx_rqe *)((char *)ring_ptr + reg.offsets.rqes);
refill_ring.rq_tail = 0;
refill_ring.ring_ptr = ring_ptr;
Receiving data
--------------
Prepare a zero copy recv request::
struct io_uring_sqe *sqe;
sqe = io_uring_get_sqe(ring);
io_uring_prep_rw(IORING_OP_RECV_ZC, sqe, fd, NULL, 0, 0);
sqe->ioprio |= IORING_RECV_MULTISHOT;
Now, submit and wait::
io_uring_submit_and_wait(ring, 1);
Finally, process completions::
struct io_uring_cqe *cqe;
unsigned int count = 0;
unsigned int head;
io_uring_for_each_cqe(ring, head, cqe) {
struct io_uring_zcrx_cqe *rcqe = (struct io_uring_zcrx_cqe *)(cqe + 1);
unsigned long mask = (1ULL << IORING_ZCRX_AREA_SHIFT) - 1;
unsigned char *data = area_ptr + (rcqe->off & mask);
/* do something with the data */
count++;
}
io_uring_cq_advance(ring, count);
Recycling buffers
-----------------
Return buffers back to the kernel to be used again::
struct io_uring_zcrx_rqe *rqe;
unsigned mask = refill_ring.ring_entries - 1;
rqe = &refill_ring.rqes[refill_ring.rq_tail & mask];
unsigned long area_offset = rcqe->off & ~IORING_ZCRX_AREA_MASK;
rqe->off = area_offset | area_reg.rq_area_token;
rqe->len = cqe->res;
IO_URING_WRITE_ONCE(*refill_ring.ktail, ++refill_ring.rq_tail);
Testing
=======
See ``tools/testing/selftests/drivers/net/hw/iou-zcrx.c``

View File

@ -30,3 +30,5 @@ source "lib/Kconfig"
source "lib/Kconfig.debug"
source "Documentation/Kconfig"
source "io_uring/Kconfig"

View File

@ -54,7 +54,6 @@ enum interruption_class {
IRQIO_C70,
IRQIO_TAP,
IRQIO_VMR,
IRQIO_LCS,
IRQIO_CTC,
IRQIO_ADM,
IRQIO_CSC,

View File

@ -84,7 +84,6 @@ static const struct irq_class irqclass_sub_desc[] = {
{.irq = IRQIO_C70, .name = "C70", .desc = "[I/O] 3270"},
{.irq = IRQIO_TAP, .name = "TAP", .desc = "[I/O] Tape"},
{.irq = IRQIO_VMR, .name = "VMR", .desc = "[I/O] Unit Record Devices"},
{.irq = IRQIO_LCS, .name = "LCS", .desc = "[I/O] LCS"},
{.irq = IRQIO_CTC, .name = "CTC", .desc = "[I/O] CTC"},
{.irq = IRQIO_ADM, .name = "ADM", .desc = "[I/O] EADM Subchannel"},
{.irq = IRQIO_CSC, .name = "CSC", .desc = "[I/O] CHSC Subchannel"},

View File

@ -432,9 +432,6 @@ static struct net_device *bond_ipsec_dev(struct xfrm_state *xs)
struct bonding *bond;
struct slave *slave;
if (!bond_dev)
return NULL;
bond = netdev_priv(bond_dev);
if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
return NULL;

View File

@ -226,7 +226,6 @@ struct __packed offload_info {
struct offload_port_info ports;
struct offload_ka_info kas;
struct offload_rr_info rrs;
u8 buf[];
};
struct __packed hw_atl_utils_fw_rpc {

View File

@ -1433,22 +1433,6 @@ int octeon_wait_for_ddr_init(struct octeon_device *oct, u32 *timeout)
}
EXPORT_SYMBOL_GPL(octeon_wait_for_ddr_init);
/* Get the octeon id assigned to the octeon device passed as argument.
* This function is exported to other modules.
* @param dev - octeon device pointer passed as a void *.
* @return octeon device id
*/
int lio_get_device_id(void *dev)
{
struct octeon_device *octeon_dev = (struct octeon_device *)dev;
u32 i;
for (i = 0; i < MAX_OCTEON_DEVICES; i++)
if (octeon_device[i] == octeon_dev)
return octeon_dev->octeon_id;
return -1;
}
void lio_enable_irq(struct octeon_droq *droq, struct octeon_instr_queue *iq)
{
u64 instr_cnt;

View File

@ -705,13 +705,6 @@ octeon_get_dispatch(struct octeon_device *octeon_dev, u16 opcode,
*/
struct octeon_device *lio_get_device(u32 octeon_id);
/** Get the octeon id assigned to the octeon device passed as argument.
* This function is exported to other modules.
* @param dev - octeon device pointer passed as a void *.
* @return octeon device id
*/
int lio_get_device_id(void *dev);
/** Read windowed register.
* @param oct - pointer to the Octeon device.
* @param addr - Address of the register to read.

View File

@ -1211,9 +1211,6 @@ struct adapter {
struct timer_list flower_stats_timer;
struct work_struct flower_stats_work;
/* Ethtool Dump */
struct ethtool_dump eth_dump;
/* HMA */
struct hma_data hma;
@ -1233,6 +1230,10 @@ struct adapter {
/* Ethtool n-tuple */
struct cxgb4_ethtool_filter *ethtool_filters;
/* Ethtool Dump */
/* Must be last - ends in a flex-array member. */
struct ethtool_dump eth_dump;
};
/* Support for "sched-class" command to allow a TX Scheduling Class to be

View File

@ -526,28 +526,6 @@ out:
return res;
}
u32 mlx4_zone_free_entries(struct mlx4_zone_allocator *zones, u32 uid, u32 obj, u32 count)
{
struct mlx4_zone_entry *zone;
int res = 0;
spin_lock(&zones->lock);
zone = __mlx4_find_zone_by_uid(zones, uid);
if (NULL == zone) {
res = -1;
goto out;
}
__mlx4_free_from_zone(zone, obj, count);
out:
spin_unlock(&zones->lock);
return res;
}
u32 mlx4_zone_free_entries_unique(struct mlx4_zone_allocator *zones, u32 obj, u32 count)
{
struct mlx4_zone_entry *zone;

View File

@ -1478,12 +1478,6 @@ void mlx4_zone_allocator_destroy(struct mlx4_zone_allocator *zone_alloc);
u32 mlx4_zone_alloc_entries(struct mlx4_zone_allocator *zones, u32 uid, int count,
int align, u32 skip_mask, u32 *puid);
/* Free <count> objects, start from <obj> of the uid <uid> from zone_allocator
* <zones>.
*/
u32 mlx4_zone_free_entries(struct mlx4_zone_allocator *zones,
u32 uid, u32 obj, u32 count);
/* If <zones> was allocated with MLX4_ZONE_ALLOC_FLAGS_NO_OVERLAP, instead of
* specifying the uid when freeing an object, zone allocator could figure it by
* itself. Other parameters are similar to mlx4_zone_free.

View File

@ -147,26 +147,6 @@ static int mlx4_set_port_mac_table(struct mlx4_dev *dev, u8 port,
return err;
}
int mlx4_find_cached_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *idx)
{
struct mlx4_port_info *info = &mlx4_priv(dev)->port[port];
struct mlx4_mac_table *table = &info->mac_table;
int i;
for (i = 0; i < MLX4_MAX_MAC_NUM; i++) {
if (!table->refs[i])
continue;
if (mac == (MLX4_MAC_MASK & be64_to_cpu(table->entries[i]))) {
*idx = i;
return 0;
}
}
return -ENOENT;
}
EXPORT_SYMBOL_GPL(mlx4_find_cached_mac);
static bool mlx4_need_mf_bond(struct mlx4_dev *dev)
{
int i, num_eth_ports = 0;

View File

@ -296,11 +296,16 @@ enum mlx5e_fec_supported_link_mode {
MLX5E_FEC_SUPPORTED_LINK_MODE_200G_2X,
MLX5E_FEC_SUPPORTED_LINK_MODE_400G_4X,
MLX5E_FEC_SUPPORTED_LINK_MODE_800G_8X,
MLX5E_FEC_SUPPORTED_LINK_MODE_200G_1X,
MLX5E_FEC_SUPPORTED_LINK_MODE_400G_2X,
MLX5E_FEC_SUPPORTED_LINK_MODE_800G_4X,
MLX5E_FEC_SUPPORTED_LINK_MODE_1600G_8X,
MLX5E_MAX_FEC_SUPPORTED_LINK_MODE,
};
#define MLX5E_FEC_FIRST_50G_PER_LANE_MODE MLX5E_FEC_SUPPORTED_LINK_MODE_50G_1X
#define MLX5E_FEC_FIRST_100G_PER_LANE_MODE MLX5E_FEC_SUPPORTED_LINK_MODE_100G_1X
#define MLX5E_FEC_FIRST_200G_PER_LANE_MODE MLX5E_FEC_SUPPORTED_LINK_MODE_200G_1X
#define MLX5E_FEC_OVERRIDE_ADMIN_POLICY(buf, policy, write, link) \
do { \
@ -320,8 +325,10 @@ static bool mlx5e_is_fec_supported_link_mode(struct mlx5_core_dev *dev,
return link_mode < MLX5E_FEC_FIRST_50G_PER_LANE_MODE ||
(link_mode < MLX5E_FEC_FIRST_100G_PER_LANE_MODE &&
MLX5_CAP_PCAM_FEATURE(dev, fec_50G_per_lane_in_pplm)) ||
(link_mode >= MLX5E_FEC_FIRST_100G_PER_LANE_MODE &&
MLX5_CAP_PCAM_FEATURE(dev, fec_100G_per_lane_in_pplm));
(link_mode < MLX5E_FEC_FIRST_200G_PER_LANE_MODE &&
MLX5_CAP_PCAM_FEATURE(dev, fec_100G_per_lane_in_pplm)) ||
(link_mode >= MLX5E_FEC_FIRST_200G_PER_LANE_MODE &&
MLX5_CAP_PCAM_FEATURE(dev, fec_200G_per_lane_in_pplm));
}
/* get/set FEC admin field for a given speed */
@ -368,6 +375,18 @@ static int mlx5e_fec_admin_field(u32 *pplm, u16 *fec_policy, bool write,
case MLX5E_FEC_SUPPORTED_LINK_MODE_800G_8X:
MLX5E_FEC_OVERRIDE_ADMIN_POLICY(pplm, *fec_policy, write, 800g_8x);
break;
case MLX5E_FEC_SUPPORTED_LINK_MODE_200G_1X:
MLX5E_FEC_OVERRIDE_ADMIN_POLICY(pplm, *fec_policy, write, 200g_1x);
break;
case MLX5E_FEC_SUPPORTED_LINK_MODE_400G_2X:
MLX5E_FEC_OVERRIDE_ADMIN_POLICY(pplm, *fec_policy, write, 400g_2x);
break;
case MLX5E_FEC_SUPPORTED_LINK_MODE_800G_4X:
MLX5E_FEC_OVERRIDE_ADMIN_POLICY(pplm, *fec_policy, write, 800g_4x);
break;
case MLX5E_FEC_SUPPORTED_LINK_MODE_1600G_8X:
MLX5E_FEC_OVERRIDE_ADMIN_POLICY(pplm, *fec_policy, write, 1600g_8x);
break;
default:
return -EINVAL;
}
@ -421,6 +440,18 @@ static int mlx5e_get_fec_cap_field(u32 *pplm, u16 *fec_cap,
case MLX5E_FEC_SUPPORTED_LINK_MODE_800G_8X:
*fec_cap = MLX5E_GET_FEC_OVERRIDE_CAP(pplm, 800g_8x);
break;
case MLX5E_FEC_SUPPORTED_LINK_MODE_200G_1X:
*fec_cap = MLX5E_GET_FEC_OVERRIDE_CAP(pplm, 200g_1x);
break;
case MLX5E_FEC_SUPPORTED_LINK_MODE_400G_2X:
*fec_cap = MLX5E_GET_FEC_OVERRIDE_CAP(pplm, 400g_2x);
break;
case MLX5E_FEC_SUPPORTED_LINK_MODE_800G_4X:
*fec_cap = MLX5E_GET_FEC_OVERRIDE_CAP(pplm, 800g_4x);
break;
case MLX5E_FEC_SUPPORTED_LINK_MODE_1600G_8X:
*fec_cap = MLX5E_GET_FEC_OVERRIDE_CAP(pplm, 1600g_8x);
break;
default:
return -EINVAL;
}
@ -494,6 +525,26 @@ out:
return 0;
}
static u16 mlx5e_remap_fec_conf_mode(enum mlx5e_fec_supported_link_mode link_mode,
u16 conf_fec)
{
/* RS fec in ethtool is originally mapped to MLX5E_FEC_RS_528_514.
* For link modes up to 25G per lane, the value is kept.
* For 50G or 100G per lane, it's remapped to MLX5E_FEC_RS_544_514.
* For 200G per lane, remapped to MLX5E_FEC_RS_544_514_INTERLEAVED_QUAD.
*/
if (conf_fec != BIT(MLX5E_FEC_RS_528_514))
return conf_fec;
if (link_mode >= MLX5E_FEC_FIRST_200G_PER_LANE_MODE)
return BIT(MLX5E_FEC_RS_544_514_INTERLEAVED_QUAD);
if (link_mode >= MLX5E_FEC_FIRST_50G_PER_LANE_MODE)
return BIT(MLX5E_FEC_RS_544_514);
return conf_fec;
}
int mlx5e_set_fec_mode(struct mlx5_core_dev *dev, u16 fec_policy)
{
bool fec_50g_per_lane = MLX5_CAP_PCAM_FEATURE(dev, fec_50G_per_lane_in_pplm);
@ -530,14 +581,7 @@ int mlx5e_set_fec_mode(struct mlx5_core_dev *dev, u16 fec_policy)
if (!mlx5e_is_fec_supported_link_mode(dev, i))
break;
/* RS fec in ethtool is mapped to MLX5E_FEC_RS_528_514
* to link modes up to 25G per lane and to
* MLX5E_FEC_RS_544_514 in the new link modes based on
* 50G or 100G per lane
*/
if (conf_fec == (1 << MLX5E_FEC_RS_528_514) &&
i >= MLX5E_FEC_FIRST_50G_PER_LANE_MODE)
conf_fec = (1 << MLX5E_FEC_RS_544_514);
conf_fec = mlx5e_remap_fec_conf_mode(i, conf_fec);
mlx5e_get_fec_cap_field(out, &fec_caps, i);

View File

@ -61,6 +61,7 @@ enum {
MLX5E_FEC_NOFEC,
MLX5E_FEC_FIRECODE,
MLX5E_FEC_RS_528_514,
MLX5E_FEC_RS_544_514_INTERLEAVED_QUAD = 4,
MLX5E_FEC_RS_544_514 = 7,
MLX5E_FEC_LLRS_272_257_1 = 9,
};

View File

@ -326,7 +326,7 @@ static int mlx5e_ptp_alloc_txqsq(struct mlx5e_ptp *c, int txq_ix,
int node;
sq->pdev = c->pdev;
sq->clock = &mdev->clock;
sq->clock = mdev->clock;
sq->mkey_be = c->mkey_be;
sq->netdev = c->netdev;
sq->priv = c->priv;
@ -696,7 +696,7 @@ static int mlx5e_init_ptp_rq(struct mlx5e_ptp *c, struct mlx5e_params *params,
rq->pdev = c->pdev;
rq->netdev = priv->netdev;
rq->priv = priv;
rq->clock = &mdev->clock;
rq->clock = mdev->clock;
rq->tstamp = &priv->tstamp;
rq->mdev = mdev;
rq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu);

View File

@ -73,11 +73,6 @@ struct mlx5e_tc_act {
bool is_terminating_action;
};
struct mlx5e_tc_flow_action {
unsigned int num_entries;
struct flow_action_entry **entries;
};
extern struct mlx5e_tc_act mlx5e_tc_act_drop;
extern struct mlx5e_tc_act mlx5e_tc_act_trap;
extern struct mlx5e_tc_act mlx5e_tc_act_accept;

View File

@ -46,7 +46,7 @@ static void mlx5e_init_trap_rq(struct mlx5e_trap *t, struct mlx5e_params *params
rq->pdev = t->pdev;
rq->netdev = priv->netdev;
rq->priv = priv;
rq->clock = &mdev->clock;
rq->clock = mdev->clock;
rq->tstamp = &priv->tstamp;
rq->mdev = mdev;
rq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu);

View File

@ -289,9 +289,9 @@ static u64 mlx5e_xsk_fill_timestamp(void *_priv)
ts = get_cqe_ts(priv->cqe);
if (mlx5_is_real_time_rq(priv->cq->mdev) || mlx5_is_real_time_sq(priv->cq->mdev))
return mlx5_real_time_cyc2time(&priv->cq->mdev->clock, ts);
return mlx5_real_time_cyc2time(priv->cq->mdev->clock, ts);
return mlx5_timecounter_cyc2time(&priv->cq->mdev->clock, ts);
return mlx5_timecounter_cyc2time(priv->cq->mdev->clock, ts);
}
static void mlx5e_xsk_request_checksum(u16 csum_start, u16 csum_offset, void *priv)

View File

@ -72,7 +72,7 @@ static int mlx5e_init_xsk_rq(struct mlx5e_channel *c,
rq->netdev = c->netdev;
rq->priv = c->priv;
rq->tstamp = c->tstamp;
rq->clock = &mdev->clock;
rq->clock = mdev->clock;
rq->icosq = &c->icosq;
rq->ix = c->ix;
rq->channel = c;

View File

@ -237,6 +237,27 @@ void mlx5e_build_ptys2ethtool_map(void)
ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT,
ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT,
ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT);
MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_200GAUI_1_200GBASE_CR1_KR1, ext,
ETHTOOL_LINK_MODE_200000baseCR_Full_BIT,
ETHTOOL_LINK_MODE_200000baseKR_Full_BIT,
ETHTOOL_LINK_MODE_200000baseDR_Full_BIT,
ETHTOOL_LINK_MODE_200000baseDR_2_Full_BIT,
ETHTOOL_LINK_MODE_200000baseSR_Full_BIT,
ETHTOOL_LINK_MODE_200000baseVR_Full_BIT);
MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_400GAUI_2_400GBASE_CR2_KR2, ext,
ETHTOOL_LINK_MODE_400000baseCR2_Full_BIT,
ETHTOOL_LINK_MODE_400000baseKR2_Full_BIT,
ETHTOOL_LINK_MODE_400000baseDR2_Full_BIT,
ETHTOOL_LINK_MODE_400000baseDR2_2_Full_BIT,
ETHTOOL_LINK_MODE_400000baseSR2_Full_BIT,
ETHTOOL_LINK_MODE_400000baseVR2_Full_BIT);
MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_800GAUI_4_800GBASE_CR4_KR4, ext,
ETHTOOL_LINK_MODE_800000baseCR4_Full_BIT,
ETHTOOL_LINK_MODE_800000baseKR4_Full_BIT,
ETHTOOL_LINK_MODE_800000baseDR4_Full_BIT,
ETHTOOL_LINK_MODE_800000baseDR4_2_Full_BIT,
ETHTOOL_LINK_MODE_800000baseSR4_Full_BIT,
ETHTOOL_LINK_MODE_800000baseVR4_Full_BIT);
}
static void mlx5e_ethtool_get_speed_arr(struct mlx5_core_dev *mdev,
@ -931,6 +952,7 @@ static const u32 pplm_fec_2_ethtool[] = {
[MLX5E_FEC_RS_528_514] = ETHTOOL_FEC_RS,
[MLX5E_FEC_RS_544_514] = ETHTOOL_FEC_RS,
[MLX5E_FEC_LLRS_272_257_1] = ETHTOOL_FEC_LLRS,
[MLX5E_FEC_RS_544_514_INTERLEAVED_QUAD] = ETHTOOL_FEC_RS,
};
static u32 pplm2ethtool_fec(u_long fec_mode, unsigned long size)

View File

@ -737,7 +737,7 @@ static int mlx5e_init_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *param
rq->netdev = c->netdev;
rq->priv = c->priv;
rq->tstamp = c->tstamp;
rq->clock = &mdev->clock;
rq->clock = mdev->clock;
rq->icosq = &c->icosq;
rq->ix = c->ix;
rq->channel = c;
@ -1614,7 +1614,7 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
int err;
sq->pdev = c->pdev;
sq->clock = &mdev->clock;
sq->clock = mdev->clock;
sq->mkey_be = c->mkey_be;
sq->netdev = c->netdev;
sq->mdev = c->mdev;
@ -3816,8 +3816,11 @@ static int mlx5e_setup_tc_mqprio(struct mlx5e_priv *priv,
/* MQPRIO is another toplevel qdisc that can't be attached
* simultaneously with the offloaded HTB.
*/
if (WARN_ON(mlx5e_selq_is_htb_enabled(&priv->selq)))
return -EINVAL;
if (mlx5e_selq_is_htb_enabled(&priv->selq)) {
NL_SET_ERR_MSG_MOD(mqprio->extack,
"MQPRIO cannot be configured when HTB offload is enabled.");
return -EOPNOTSUPP;
}
switch (mqprio->mode) {
case TC_MQPRIO_MODE_DCB:

View File

@ -97,7 +97,7 @@ static int mlx5_lag_create_port_sel_table(struct mlx5_lag *ldev,
mlx5_del_flow_rules(lag_definer->rules[idx]);
}
j = ldev->buckets;
};
}
goto destroy_fg;
}
}

View File

@ -43,6 +43,8 @@
#include <linux/cpufeature.h>
#endif /* CONFIG_X86 */
#define MLX5_RT_CLOCK_IDENTITY_SIZE MLX5_FLD_SZ_BYTES(mrtcq_reg, rt_clock_identity)
enum {
MLX5_PIN_MODE_IN = 0x0,
MLX5_PIN_MODE_OUT = 0x1,
@ -77,6 +79,56 @@ enum {
MLX5_MTUTC_OPERATION_ADJUST_TIME_EXTENDED_MAX = 200000,
};
struct mlx5_clock_dev_state {
struct mlx5_core_dev *mdev;
struct mlx5_devcom_comp_dev *compdev;
struct mlx5_nb pps_nb;
struct work_struct out_work;
};
struct mlx5_clock_priv {
struct mlx5_clock clock;
struct mlx5_core_dev *mdev;
struct mutex lock; /* protect mdev and used in PTP callbacks */
struct mlx5_core_dev *event_mdev;
};
static struct mlx5_clock_priv *clock_priv(struct mlx5_clock *clock)
{
return container_of(clock, struct mlx5_clock_priv, clock);
}
static void mlx5_clock_lockdep_assert(struct mlx5_clock *clock)
{
if (!clock->shared)
return;
lockdep_assert(lockdep_is_held(&clock_priv(clock)->lock));
}
static struct mlx5_core_dev *mlx5_clock_mdev_get(struct mlx5_clock *clock)
{
mlx5_clock_lockdep_assert(clock);
return clock_priv(clock)->mdev;
}
static void mlx5_clock_lock(struct mlx5_clock *clock)
{
if (!clock->shared)
return;
mutex_lock(&clock_priv(clock)->lock);
}
static void mlx5_clock_unlock(struct mlx5_clock *clock)
{
if (!clock->shared)
return;
mutex_unlock(&clock_priv(clock)->lock);
}
static bool mlx5_real_time_mode(struct mlx5_core_dev *mdev)
{
return (mlx5_is_real_time_rq(mdev) || mlx5_is_real_time_sq(mdev));
@ -94,6 +146,22 @@ static bool mlx5_modify_mtutc_allowed(struct mlx5_core_dev *mdev)
return MLX5_CAP_MCAM_FEATURE(mdev, ptpcyc2realtime_modify);
}
static int mlx5_clock_identity_get(struct mlx5_core_dev *mdev,
u8 identify[MLX5_RT_CLOCK_IDENTITY_SIZE])
{
u32 out[MLX5_ST_SZ_DW(mrtcq_reg)] = {};
u32 in[MLX5_ST_SZ_DW(mrtcq_reg)] = {};
int err;
err = mlx5_core_access_reg(mdev, in, sizeof(in),
out, sizeof(out), MLX5_REG_MRTCQ, 0, 0);
if (!err)
memcpy(identify, MLX5_ADDR_OF(mrtcq_reg, out, rt_clock_identity),
MLX5_RT_CLOCK_IDENTITY_SIZE);
return err;
}
static u32 mlx5_ptp_shift_constant(u32 dev_freq_khz)
{
/* Optimal shift constant leads to corrections above just 1 scaled ppm.
@ -119,21 +187,30 @@ static u32 mlx5_ptp_shift_constant(u32 dev_freq_khz)
ilog2((U32_MAX / NSEC_PER_MSEC) * dev_freq_khz));
}
static s32 mlx5_ptp_getmaxphase(struct ptp_clock_info *ptp)
static s32 mlx5_clock_getmaxphase(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = container_of(ptp, struct mlx5_clock, ptp_info);
struct mlx5_core_dev *mdev;
mdev = container_of(clock, struct mlx5_core_dev, clock);
return MLX5_CAP_MCAM_FEATURE(mdev, mtutc_time_adjustment_extended_range) ?
MLX5_MTUTC_OPERATION_ADJUST_TIME_EXTENDED_MAX :
MLX5_MTUTC_OPERATION_ADJUST_TIME_MAX;
}
static s32 mlx5_ptp_getmaxphase(struct ptp_clock_info *ptp)
{
struct mlx5_clock *clock = container_of(ptp, struct mlx5_clock, ptp_info);
struct mlx5_core_dev *mdev;
s32 ret;
mlx5_clock_lock(clock);
mdev = mlx5_clock_mdev_get(clock);
ret = mlx5_clock_getmaxphase(mdev);
mlx5_clock_unlock(clock);
return ret;
}
static bool mlx5_is_mtutc_time_adj_cap(struct mlx5_core_dev *mdev, s64 delta)
{
s64 max = mlx5_ptp_getmaxphase(&mdev->clock.ptp_info);
s64 max = mlx5_clock_getmaxphase(mdev);
if (delta < -max || delta > max)
return false;
@ -209,7 +286,7 @@ static int mlx5_mtctr_syncdevicetime(ktime_t *device_time,
if (real_time_mode)
*device_time = ns_to_ktime(REAL_TIME_TO_NS(device >> 32, device & U32_MAX));
else
*device_time = mlx5_timecounter_cyc2time(&mdev->clock, device);
*device_time = mlx5_timecounter_cyc2time(mdev->clock, device);
return 0;
}
@ -220,16 +297,23 @@ static int mlx5_ptp_getcrosststamp(struct ptp_clock_info *ptp,
struct mlx5_clock *clock = container_of(ptp, struct mlx5_clock, ptp_info);
struct system_time_snapshot history_begin = {0};
struct mlx5_core_dev *mdev;
int err;
mdev = container_of(clock, struct mlx5_core_dev, clock);
mlx5_clock_lock(clock);
mdev = mlx5_clock_mdev_get(clock);
if (!mlx5_is_ptm_source_time_available(mdev))
return -EBUSY;
if (!mlx5_is_ptm_source_time_available(mdev)) {
err = -EBUSY;
goto unlock;
}
ktime_get_snapshot(&history_begin);
return get_device_system_crosststamp(mlx5_mtctr_syncdevicetime, mdev,
&history_begin, cts);
err = get_device_system_crosststamp(mlx5_mtctr_syncdevicetime, mdev,
&history_begin, cts);
unlock:
mlx5_clock_unlock(clock);
return err;
}
#endif /* CONFIG_X86 */
@ -263,8 +347,7 @@ static u64 read_internal_timer(const struct cyclecounter *cc)
{
struct mlx5_timer *timer = container_of(cc, struct mlx5_timer, cycles);
struct mlx5_clock *clock = container_of(timer, struct mlx5_clock, timer);
struct mlx5_core_dev *mdev = container_of(clock, struct mlx5_core_dev,
clock);
struct mlx5_core_dev *mdev = mlx5_clock_mdev_get(clock);
return mlx5_read_time(mdev, NULL, false) & cc->mask;
}
@ -272,7 +355,7 @@ static u64 read_internal_timer(const struct cyclecounter *cc)
static void mlx5_update_clock_info_page(struct mlx5_core_dev *mdev)
{
struct mlx5_ib_clock_info *clock_info = mdev->clock_info;
struct mlx5_clock *clock = &mdev->clock;
struct mlx5_clock *clock = mdev->clock;
struct mlx5_timer *timer;
u32 sign;
@ -295,12 +378,10 @@ static void mlx5_update_clock_info_page(struct mlx5_core_dev *mdev)
static void mlx5_pps_out(struct work_struct *work)
{
struct mlx5_pps *pps_info = container_of(work, struct mlx5_pps,
out_work);
struct mlx5_clock *clock = container_of(pps_info, struct mlx5_clock,
pps_info);
struct mlx5_core_dev *mdev = container_of(clock, struct mlx5_core_dev,
clock);
struct mlx5_clock_dev_state *clock_state = container_of(work, struct mlx5_clock_dev_state,
out_work);
struct mlx5_core_dev *mdev = clock_state->mdev;
struct mlx5_clock *clock = mdev->clock;
u32 in[MLX5_ST_SZ_DW(mtpps_reg)] = {0};
unsigned long flags;
int i;
@ -330,7 +411,8 @@ static long mlx5_timestamp_overflow(struct ptp_clock_info *ptp_info)
unsigned long flags;
clock = container_of(ptp_info, struct mlx5_clock, ptp_info);
mdev = container_of(clock, struct mlx5_core_dev, clock);
mlx5_clock_lock(clock);
mdev = mlx5_clock_mdev_get(clock);
timer = &clock->timer;
if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
@ -342,6 +424,7 @@ static long mlx5_timestamp_overflow(struct ptp_clock_info *ptp_info)
write_sequnlock_irqrestore(&clock->lock, flags);
out:
mlx5_clock_unlock(clock);
return timer->overflow_period;
}
@ -361,15 +444,12 @@ static int mlx5_ptp_settime_real_time(struct mlx5_core_dev *mdev,
return mlx5_set_mtutc(mdev, in, sizeof(in));
}
static int mlx5_ptp_settime(struct ptp_clock_info *ptp, const struct timespec64 *ts)
static int mlx5_clock_settime(struct mlx5_core_dev *mdev, struct mlx5_clock *clock,
const struct timespec64 *ts)
{
struct mlx5_clock *clock = container_of(ptp, struct mlx5_clock, ptp_info);
struct mlx5_timer *timer = &clock->timer;
struct mlx5_core_dev *mdev;
unsigned long flags;
mdev = container_of(clock, struct mlx5_core_dev, clock);
if (mlx5_modify_mtutc_allowed(mdev)) {
int err = mlx5_ptp_settime_real_time(mdev, ts);
@ -385,6 +465,20 @@ static int mlx5_ptp_settime(struct ptp_clock_info *ptp, const struct timespec64
return 0;
}
static int mlx5_ptp_settime(struct ptp_clock_info *ptp, const struct timespec64 *ts)
{
struct mlx5_clock *clock = container_of(ptp, struct mlx5_clock, ptp_info);
struct mlx5_core_dev *mdev;
int err;
mlx5_clock_lock(clock);
mdev = mlx5_clock_mdev_get(clock);
err = mlx5_clock_settime(mdev, clock, ts);
mlx5_clock_unlock(clock);
return err;
}
static
struct timespec64 mlx5_ptp_gettimex_real_time(struct mlx5_core_dev *mdev,
struct ptp_system_timestamp *sts)
@ -404,7 +498,8 @@ static int mlx5_ptp_gettimex(struct ptp_clock_info *ptp, struct timespec64 *ts,
struct mlx5_core_dev *mdev;
u64 cycles, ns;
mdev = container_of(clock, struct mlx5_core_dev, clock);
mlx5_clock_lock(clock);
mdev = mlx5_clock_mdev_get(clock);
if (mlx5_real_time_mode(mdev)) {
*ts = mlx5_ptp_gettimex_real_time(mdev, sts);
goto out;
@ -414,6 +509,7 @@ static int mlx5_ptp_gettimex(struct ptp_clock_info *ptp, struct timespec64 *ts,
ns = mlx5_timecounter_cyc2time(clock, cycles);
*ts = ns_to_timespec64(ns);
out:
mlx5_clock_unlock(clock);
return 0;
}
@ -444,14 +540,16 @@ static int mlx5_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
struct mlx5_timer *timer = &clock->timer;
struct mlx5_core_dev *mdev;
unsigned long flags;
int err = 0;
mdev = container_of(clock, struct mlx5_core_dev, clock);
mlx5_clock_lock(clock);
mdev = mlx5_clock_mdev_get(clock);
if (mlx5_modify_mtutc_allowed(mdev)) {
int err = mlx5_ptp_adjtime_real_time(mdev, delta);
err = mlx5_ptp_adjtime_real_time(mdev, delta);
if (err)
return err;
goto unlock;
}
write_seqlock_irqsave(&clock->lock, flags);
@ -459,17 +557,23 @@ static int mlx5_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
mlx5_update_clock_info_page(mdev);
write_sequnlock_irqrestore(&clock->lock, flags);
return 0;
unlock:
mlx5_clock_unlock(clock);
return err;
}
static int mlx5_ptp_adjphase(struct ptp_clock_info *ptp, s32 delta)
{
struct mlx5_clock *clock = container_of(ptp, struct mlx5_clock, ptp_info);
struct mlx5_core_dev *mdev;
int err;
mdev = container_of(clock, struct mlx5_core_dev, clock);
mlx5_clock_lock(clock);
mdev = mlx5_clock_mdev_get(clock);
err = mlx5_ptp_adjtime_real_time(mdev, delta);
mlx5_clock_unlock(clock);
return mlx5_ptp_adjtime_real_time(mdev, delta);
return err;
}
static int mlx5_ptp_freq_adj_real_time(struct mlx5_core_dev *mdev, long scaled_ppm)
@ -498,15 +602,17 @@ static int mlx5_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
struct mlx5_timer *timer = &clock->timer;
struct mlx5_core_dev *mdev;
unsigned long flags;
int err = 0;
u32 mult;
mdev = container_of(clock, struct mlx5_core_dev, clock);
mlx5_clock_lock(clock);
mdev = mlx5_clock_mdev_get(clock);
if (mlx5_modify_mtutc_allowed(mdev)) {
int err = mlx5_ptp_freq_adj_real_time(mdev, scaled_ppm);
err = mlx5_ptp_freq_adj_real_time(mdev, scaled_ppm);
if (err)
return err;
goto unlock;
}
mult = (u32)adjust_by_scaled_ppm(timer->nominal_c_mult, scaled_ppm);
@ -518,7 +624,9 @@ static int mlx5_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
write_sequnlock_irqrestore(&clock->lock, flags);
ptp_schedule_worker(clock->ptp, timer->overflow_period);
return 0;
unlock:
mlx5_clock_unlock(clock);
return err;
}
static int mlx5_extts_configure(struct ptp_clock_info *ptp,
@ -527,18 +635,14 @@ static int mlx5_extts_configure(struct ptp_clock_info *ptp,
{
struct mlx5_clock *clock =
container_of(ptp, struct mlx5_clock, ptp_info);
struct mlx5_core_dev *mdev =
container_of(clock, struct mlx5_core_dev, clock);
u32 in[MLX5_ST_SZ_DW(mtpps_reg)] = {0};
struct mlx5_core_dev *mdev;
u32 field_select = 0;
u8 pin_mode = 0;
u8 pattern = 0;
int pin = -1;
int err = 0;
if (!MLX5_PPS_CAP(mdev))
return -EOPNOTSUPP;
/* Reject requests with unsupported flags */
if (rq->extts.flags & ~(PTP_ENABLE_FEATURE |
PTP_RISING_EDGE |
@ -569,6 +673,14 @@ static int mlx5_extts_configure(struct ptp_clock_info *ptp,
field_select = MLX5_MTPPS_FS_ENABLE;
}
mlx5_clock_lock(clock);
mdev = mlx5_clock_mdev_get(clock);
if (!MLX5_PPS_CAP(mdev)) {
err = -EOPNOTSUPP;
goto unlock;
}
MLX5_SET(mtpps_reg, in, pin, pin);
MLX5_SET(mtpps_reg, in, pin_mode, pin_mode);
MLX5_SET(mtpps_reg, in, pattern, pattern);
@ -577,15 +689,23 @@ static int mlx5_extts_configure(struct ptp_clock_info *ptp,
err = mlx5_set_mtpps(mdev, in, sizeof(in));
if (err)
return err;
goto unlock;
return mlx5_set_mtppse(mdev, pin, 0,
MLX5_EVENT_MODE_REPETETIVE & on);
err = mlx5_set_mtppse(mdev, pin, 0, MLX5_EVENT_MODE_REPETETIVE & on);
if (err)
goto unlock;
clock->pps_info.pin_armed[pin] = on;
clock_priv(clock)->event_mdev = mdev;
unlock:
mlx5_clock_unlock(clock);
return err;
}
static u64 find_target_cycles(struct mlx5_core_dev *mdev, s64 target_ns)
{
struct mlx5_clock *clock = &mdev->clock;
struct mlx5_clock *clock = mdev->clock;
u64 cycles_now, cycles_delta;
u64 nsec_now, nsec_delta;
struct mlx5_timer *timer;
@ -644,7 +764,7 @@ static int mlx5_perout_conf_out_pulse_duration(struct mlx5_core_dev *mdev,
struct ptp_clock_request *rq,
u32 *out_pulse_duration_ns)
{
struct mlx5_pps *pps_info = &mdev->clock.pps_info;
struct mlx5_pps *pps_info = &mdev->clock->pps_info;
u32 out_pulse_duration;
struct timespec64 ts;
@ -677,7 +797,7 @@ static int perout_conf_npps_real_time(struct mlx5_core_dev *mdev, struct ptp_clo
u32 *field_select, u32 *out_pulse_duration_ns,
u64 *period, u64 *time_stamp)
{
struct mlx5_pps *pps_info = &mdev->clock.pps_info;
struct mlx5_pps *pps_info = &mdev->clock->pps_info;
struct ptp_clock_time *time = &rq->perout.start;
struct timespec64 ts;
@ -712,26 +832,18 @@ static int mlx5_perout_configure(struct ptp_clock_info *ptp,
{
struct mlx5_clock *clock =
container_of(ptp, struct mlx5_clock, ptp_info);
struct mlx5_core_dev *mdev =
container_of(clock, struct mlx5_core_dev, clock);
bool rt_mode = mlx5_real_time_mode(mdev);
u32 in[MLX5_ST_SZ_DW(mtpps_reg)] = {0};
u32 out_pulse_duration_ns = 0;
struct mlx5_core_dev *mdev;
u32 field_select = 0;
u64 npps_period = 0;
u64 time_stamp = 0;
u8 pin_mode = 0;
u8 pattern = 0;
bool rt_mode;
int pin = -1;
int err = 0;
if (!MLX5_PPS_CAP(mdev))
return -EOPNOTSUPP;
/* Reject requests with unsupported flags */
if (mlx5_perout_verify_flags(mdev, rq->perout.flags))
return -EOPNOTSUPP;
if (rq->perout.index >= clock->ptp_info.n_pins)
return -EINVAL;
@ -740,14 +852,29 @@ static int mlx5_perout_configure(struct ptp_clock_info *ptp,
if (pin < 0)
return -EBUSY;
if (on) {
bool rt_mode = mlx5_real_time_mode(mdev);
mlx5_clock_lock(clock);
mdev = mlx5_clock_mdev_get(clock);
rt_mode = mlx5_real_time_mode(mdev);
if (!MLX5_PPS_CAP(mdev)) {
err = -EOPNOTSUPP;
goto unlock;
}
/* Reject requests with unsupported flags */
if (mlx5_perout_verify_flags(mdev, rq->perout.flags)) {
err = -EOPNOTSUPP;
goto unlock;
}
if (on) {
pin_mode = MLX5_PIN_MODE_OUT;
pattern = MLX5_OUT_PATTERN_PERIODIC;
if (rt_mode && rq->perout.start.sec > U32_MAX)
return -EINVAL;
if (rt_mode && rq->perout.start.sec > U32_MAX) {
err = -EINVAL;
goto unlock;
}
field_select |= MLX5_MTPPS_FS_PIN_MODE |
MLX5_MTPPS_FS_PATTERN |
@ -760,7 +887,7 @@ static int mlx5_perout_configure(struct ptp_clock_info *ptp,
else
err = perout_conf_1pps(mdev, rq, &time_stamp, rt_mode);
if (err)
return err;
goto unlock;
}
MLX5_SET(mtpps_reg, in, pin, pin);
@ -773,13 +900,16 @@ static int mlx5_perout_configure(struct ptp_clock_info *ptp,
MLX5_SET(mtpps_reg, in, out_pulse_duration_ns, out_pulse_duration_ns);
err = mlx5_set_mtpps(mdev, in, sizeof(in));
if (err)
return err;
goto unlock;
if (rt_mode)
return 0;
goto unlock;
return mlx5_set_mtppse(mdev, pin, 0,
MLX5_EVENT_MODE_REPETETIVE & on);
err = mlx5_set_mtppse(mdev, pin, 0, MLX5_EVENT_MODE_REPETETIVE & on);
unlock:
mlx5_clock_unlock(clock);
return err;
}
static int mlx5_pps_configure(struct ptp_clock_info *ptp,
@ -866,10 +996,8 @@ static int mlx5_query_mtpps_pin_mode(struct mlx5_core_dev *mdev, u8 pin,
mtpps_size, MLX5_REG_MTPPS, 0, 0);
}
static int mlx5_get_pps_pin_mode(struct mlx5_clock *clock, u8 pin)
static int mlx5_get_pps_pin_mode(struct mlx5_core_dev *mdev, u8 pin)
{
struct mlx5_core_dev *mdev = container_of(clock, struct mlx5_core_dev, clock);
u32 out[MLX5_ST_SZ_DW(mtpps_reg)] = {};
u8 mode;
int err;
@ -888,8 +1016,9 @@ static int mlx5_get_pps_pin_mode(struct mlx5_clock *clock, u8 pin)
return PTP_PF_NONE;
}
static void mlx5_init_pin_config(struct mlx5_clock *clock)
static void mlx5_init_pin_config(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = mdev->clock;
int i;
if (!clock->ptp_info.n_pins)
@ -910,15 +1039,15 @@ static void mlx5_init_pin_config(struct mlx5_clock *clock)
sizeof(clock->ptp_info.pin_config[i].name),
"mlx5_pps%d", i);
clock->ptp_info.pin_config[i].index = i;
clock->ptp_info.pin_config[i].func = mlx5_get_pps_pin_mode(clock, i);
clock->ptp_info.pin_config[i].func = mlx5_get_pps_pin_mode(mdev, i);
clock->ptp_info.pin_config[i].chan = 0;
}
}
static void mlx5_get_pps_caps(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = &mdev->clock;
u32 out[MLX5_ST_SZ_DW(mtpps_reg)] = {0};
struct mlx5_clock *clock = mdev->clock;
mlx5_query_mtpps(mdev, out, sizeof(out));
@ -968,16 +1097,16 @@ static u64 perout_conf_next_event_timer(struct mlx5_core_dev *mdev,
static int mlx5_pps_event(struct notifier_block *nb,
unsigned long type, void *data)
{
struct mlx5_clock *clock = mlx5_nb_cof(nb, struct mlx5_clock, pps_nb);
struct mlx5_clock_dev_state *clock_state = mlx5_nb_cof(nb, struct mlx5_clock_dev_state,
pps_nb);
struct mlx5_core_dev *mdev = clock_state->mdev;
struct mlx5_clock *clock = mdev->clock;
struct ptp_clock_event ptp_event;
struct mlx5_eqe *eqe = data;
int pin = eqe->data.pps.pin;
struct mlx5_core_dev *mdev;
unsigned long flags;
u64 ns;
mdev = container_of(clock, struct mlx5_core_dev, clock);
switch (clock->ptp_info.pin_config[pin].func) {
case PTP_PF_EXTTS:
ptp_event.index = pin;
@ -997,11 +1126,15 @@ static int mlx5_pps_event(struct notifier_block *nb,
ptp_clock_event(clock->ptp, &ptp_event);
break;
case PTP_PF_PEROUT:
if (clock->shared) {
mlx5_core_warn(mdev, " Received unexpected PPS out event\n");
break;
}
ns = perout_conf_next_event_timer(mdev, clock);
write_seqlock_irqsave(&clock->lock, flags);
clock->pps_info.start[pin] = ns;
write_sequnlock_irqrestore(&clock->lock, flags);
schedule_work(&clock->pps_info.out_work);
schedule_work(&clock_state->out_work);
break;
default:
mlx5_core_err(mdev, " Unhandled clock PPS event, func %d\n",
@ -1013,7 +1146,7 @@ static int mlx5_pps_event(struct notifier_block *nb,
static void mlx5_timecounter_init(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = &mdev->clock;
struct mlx5_clock *clock = mdev->clock;
struct mlx5_timer *timer = &clock->timer;
u32 dev_freq;
@ -1029,10 +1162,10 @@ static void mlx5_timecounter_init(struct mlx5_core_dev *mdev)
ktime_to_ns(ktime_get_real()));
}
static void mlx5_init_overflow_period(struct mlx5_clock *clock)
static void mlx5_init_overflow_period(struct mlx5_core_dev *mdev)
{
struct mlx5_core_dev *mdev = container_of(clock, struct mlx5_core_dev, clock);
struct mlx5_ib_clock_info *clock_info = mdev->clock_info;
struct mlx5_clock *clock = mdev->clock;
struct mlx5_timer *timer = &clock->timer;
u64 overflow_cycles;
u64 frac = 0;
@ -1065,7 +1198,7 @@ static void mlx5_init_overflow_period(struct mlx5_clock *clock)
static void mlx5_init_clock_info(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = &mdev->clock;
struct mlx5_clock *clock = mdev->clock;
struct mlx5_ib_clock_info *info;
struct mlx5_timer *timer;
@ -1088,7 +1221,7 @@ static void mlx5_init_clock_info(struct mlx5_core_dev *mdev)
static void mlx5_init_timer_max_freq_adjustment(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = &mdev->clock;
struct mlx5_clock *clock = mdev->clock;
u32 out[MLX5_ST_SZ_DW(mtutc_reg)] = {};
u32 in[MLX5_ST_SZ_DW(mtutc_reg)] = {};
u8 log_max_freq_adjustment = 0;
@ -1107,7 +1240,7 @@ static void mlx5_init_timer_max_freq_adjustment(struct mlx5_core_dev *mdev)
static void mlx5_init_timer_clock(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = &mdev->clock;
struct mlx5_clock *clock = mdev->clock;
/* Configure the PHC */
clock->ptp_info = mlx5_ptp_clock_info;
@ -1123,38 +1256,30 @@ static void mlx5_init_timer_clock(struct mlx5_core_dev *mdev)
mlx5_timecounter_init(mdev);
mlx5_init_clock_info(mdev);
mlx5_init_overflow_period(clock);
mlx5_init_overflow_period(mdev);
if (mlx5_real_time_mode(mdev)) {
struct timespec64 ts;
ktime_get_real_ts64(&ts);
mlx5_ptp_settime(&clock->ptp_info, &ts);
mlx5_clock_settime(mdev, clock, &ts);
}
}
static void mlx5_init_pps(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = &mdev->clock;
if (!MLX5_PPS_CAP(mdev))
return;
mlx5_get_pps_caps(mdev);
mlx5_init_pin_config(clock);
mlx5_init_pin_config(mdev);
}
void mlx5_init_clock(struct mlx5_core_dev *mdev)
static void mlx5_init_clock_dev(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = &mdev->clock;
if (!MLX5_CAP_GEN(mdev, device_frequency_khz)) {
mlx5_core_warn(mdev, "invalid device_frequency_khz, aborting HW clock init\n");
return;
}
struct mlx5_clock *clock = mdev->clock;
seqlock_init(&clock->lock);
INIT_WORK(&clock->pps_info.out_work, mlx5_pps_out);
/* Initialize the device clock */
mlx5_init_timer_clock(mdev);
@ -1163,35 +1288,27 @@ void mlx5_init_clock(struct mlx5_core_dev *mdev)
mlx5_init_pps(mdev);
clock->ptp = ptp_clock_register(&clock->ptp_info,
&mdev->pdev->dev);
clock->shared ? NULL : &mdev->pdev->dev);
if (IS_ERR(clock->ptp)) {
mlx5_core_warn(mdev, "ptp_clock_register failed %ld\n",
mlx5_core_warn(mdev, "%sptp_clock_register failed %ld\n",
clock->shared ? "shared clock " : "",
PTR_ERR(clock->ptp));
clock->ptp = NULL;
}
MLX5_NB_INIT(&clock->pps_nb, mlx5_pps_event, PPS_EVENT);
mlx5_eq_notifier_register(mdev, &clock->pps_nb);
if (clock->ptp)
ptp_schedule_worker(clock->ptp, 0);
}
void mlx5_cleanup_clock(struct mlx5_core_dev *mdev)
static void mlx5_destroy_clock_dev(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = &mdev->clock;
struct mlx5_clock *clock = mdev->clock;
if (!MLX5_CAP_GEN(mdev, device_frequency_khz))
return;
mlx5_eq_notifier_unregister(mdev, &clock->pps_nb);
if (clock->ptp) {
ptp_clock_unregister(clock->ptp);
clock->ptp = NULL;
}
cancel_work_sync(&clock->pps_info.out_work);
if (mdev->clock_info) {
free_page((unsigned long)mdev->clock_info);
mdev->clock_info = NULL;
@ -1199,3 +1316,248 @@ void mlx5_cleanup_clock(struct mlx5_core_dev *mdev)
kfree(clock->ptp_info.pin_config);
}
static void mlx5_clock_free(struct mlx5_core_dev *mdev)
{
struct mlx5_clock_priv *cpriv = clock_priv(mdev->clock);
mlx5_destroy_clock_dev(mdev);
mutex_destroy(&cpriv->lock);
kfree(cpriv);
mdev->clock = NULL;
}
static int mlx5_clock_alloc(struct mlx5_core_dev *mdev, bool shared)
{
struct mlx5_clock_priv *cpriv;
struct mlx5_clock *clock;
cpriv = kzalloc(sizeof(*cpriv), GFP_KERNEL);
if (!cpriv)
return -ENOMEM;
mutex_init(&cpriv->lock);
cpriv->mdev = mdev;
clock = &cpriv->clock;
clock->shared = shared;
mdev->clock = clock;
mlx5_clock_lock(clock);
mlx5_init_clock_dev(mdev);
mlx5_clock_unlock(clock);
if (!clock->shared)
return 0;
if (!clock->ptp) {
mlx5_core_warn(mdev, "failed to create ptp dev shared by multiple functions");
mlx5_clock_free(mdev);
return -EINVAL;
}
return 0;
}
static void mlx5_shared_clock_register(struct mlx5_core_dev *mdev, u64 key)
{
struct mlx5_core_dev *peer_dev, *next = NULL;
struct mlx5_devcom_comp_dev *pos;
mdev->clock_state->compdev = mlx5_devcom_register_component(mdev->priv.devc,
MLX5_DEVCOM_SHARED_CLOCK,
key, NULL, mdev);
if (IS_ERR(mdev->clock_state->compdev))
return;
mlx5_devcom_comp_lock(mdev->clock_state->compdev);
mlx5_devcom_for_each_peer_entry(mdev->clock_state->compdev, peer_dev, pos) {
if (peer_dev->clock) {
next = peer_dev;
break;
}
}
if (next) {
mdev->clock = next->clock;
/* clock info is shared among all the functions using the same clock */
mdev->clock_info = next->clock_info;
} else {
mlx5_clock_alloc(mdev, true);
}
mlx5_devcom_comp_unlock(mdev->clock_state->compdev);
if (!mdev->clock) {
mlx5_devcom_unregister_component(mdev->clock_state->compdev);
mdev->clock_state->compdev = NULL;
}
}
static void mlx5_shared_clock_unregister(struct mlx5_core_dev *mdev)
{
struct mlx5_core_dev *peer_dev, *next = NULL;
struct mlx5_clock *clock = mdev->clock;
struct mlx5_devcom_comp_dev *pos;
mlx5_devcom_comp_lock(mdev->clock_state->compdev);
mlx5_devcom_for_each_peer_entry(mdev->clock_state->compdev, peer_dev, pos) {
if (peer_dev->clock && peer_dev != mdev) {
next = peer_dev;
break;
}
}
if (next) {
struct mlx5_clock_priv *cpriv = clock_priv(clock);
mlx5_clock_lock(clock);
if (mdev == cpriv->mdev)
cpriv->mdev = next;
mlx5_clock_unlock(clock);
} else {
mlx5_clock_free(mdev);
}
mdev->clock = NULL;
mdev->clock_info = NULL;
mlx5_devcom_comp_unlock(mdev->clock_state->compdev);
mlx5_devcom_unregister_component(mdev->clock_state->compdev);
}
static void mlx5_clock_arm_pps_in_event(struct mlx5_clock *clock,
struct mlx5_core_dev *new_mdev,
struct mlx5_core_dev *old_mdev)
{
struct ptp_clock_info *ptp_info = &clock->ptp_info;
struct mlx5_clock_priv *cpriv = clock_priv(clock);
int i;
for (i = 0; i < ptp_info->n_pins; i++) {
if (ptp_info->pin_config[i].func != PTP_PF_EXTTS ||
!clock->pps_info.pin_armed[i])
continue;
if (new_mdev) {
mlx5_set_mtppse(new_mdev, i, 0, MLX5_EVENT_MODE_REPETETIVE);
cpriv->event_mdev = new_mdev;
} else {
cpriv->event_mdev = NULL;
}
if (old_mdev)
mlx5_set_mtppse(old_mdev, i, 0, MLX5_EVENT_MODE_DISABLE);
}
}
void mlx5_clock_load(struct mlx5_core_dev *mdev)
{
struct mlx5_clock *clock = mdev->clock;
struct mlx5_clock_priv *cpriv;
if (!MLX5_CAP_GEN(mdev, device_frequency_khz))
return;
INIT_WORK(&mdev->clock_state->out_work, mlx5_pps_out);
MLX5_NB_INIT(&mdev->clock_state->pps_nb, mlx5_pps_event, PPS_EVENT);
mlx5_eq_notifier_register(mdev, &mdev->clock_state->pps_nb);
if (!clock->shared) {
mlx5_clock_arm_pps_in_event(clock, mdev, NULL);
return;
}
cpriv = clock_priv(clock);
mlx5_devcom_comp_lock(mdev->clock_state->compdev);
mlx5_clock_lock(clock);
if (mdev == cpriv->mdev && mdev != cpriv->event_mdev)
mlx5_clock_arm_pps_in_event(clock, mdev, cpriv->event_mdev);
mlx5_clock_unlock(clock);
mlx5_devcom_comp_unlock(mdev->clock_state->compdev);
}
void mlx5_clock_unload(struct mlx5_core_dev *mdev)
{
struct mlx5_core_dev *peer_dev, *next = NULL;
struct mlx5_clock *clock = mdev->clock;
struct mlx5_devcom_comp_dev *pos;
if (!MLX5_CAP_GEN(mdev, device_frequency_khz))
return;
if (!clock->shared) {
mlx5_clock_arm_pps_in_event(clock, NULL, mdev);
goto out;
}
mlx5_devcom_comp_lock(mdev->clock_state->compdev);
mlx5_devcom_for_each_peer_entry(mdev->clock_state->compdev, peer_dev, pos) {
if (peer_dev->clock && peer_dev != mdev) {
next = peer_dev;
break;
}
}
mlx5_clock_lock(clock);
if (mdev == clock_priv(clock)->event_mdev)
mlx5_clock_arm_pps_in_event(clock, next, mdev);
mlx5_clock_unlock(clock);
mlx5_devcom_comp_unlock(mdev->clock_state->compdev);
out:
mlx5_eq_notifier_unregister(mdev, &mdev->clock_state->pps_nb);
cancel_work_sync(&mdev->clock_state->out_work);
}
static struct mlx5_clock null_clock;
int mlx5_init_clock(struct mlx5_core_dev *mdev)
{
u8 identity[MLX5_RT_CLOCK_IDENTITY_SIZE];
struct mlx5_clock_dev_state *clock_state;
u64 key;
int err;
if (!MLX5_CAP_GEN(mdev, device_frequency_khz)) {
mdev->clock = &null_clock;
mlx5_core_warn(mdev, "invalid device_frequency_khz, aborting HW clock init\n");
return 0;
}
clock_state = kzalloc(sizeof(*clock_state), GFP_KERNEL);
if (!clock_state)
return -ENOMEM;
clock_state->mdev = mdev;
mdev->clock_state = clock_state;
if (MLX5_CAP_MCAM_REG3(mdev, mrtcq) && mlx5_real_time_mode(mdev)) {
if (mlx5_clock_identity_get(mdev, identity)) {
mlx5_core_warn(mdev, "failed to get rt clock identity, create ptp dev per function\n");
} else {
memcpy(&key, &identity, sizeof(key));
mlx5_shared_clock_register(mdev, key);
}
}
if (!mdev->clock) {
err = mlx5_clock_alloc(mdev, false);
if (err) {
kfree(clock_state);
mdev->clock_state = NULL;
return err;
}
}
return 0;
}
void mlx5_cleanup_clock(struct mlx5_core_dev *mdev)
{
if (!MLX5_CAP_GEN(mdev, device_frequency_khz))
return;
if (mdev->clock->shared)
mlx5_shared_clock_unregister(mdev);
else
mlx5_clock_free(mdev);
kfree(mdev->clock_state);
mdev->clock_state = NULL;
}

View File

@ -33,6 +33,35 @@
#ifndef __LIB_CLOCK_H__
#define __LIB_CLOCK_H__
#include <linux/ptp_clock_kernel.h>
#define MAX_PIN_NUM 8
struct mlx5_pps {
u8 pin_caps[MAX_PIN_NUM];
u64 start[MAX_PIN_NUM];
u8 enabled;
u64 min_npps_period;
u64 min_out_pulse_duration_ns;
bool pin_armed[MAX_PIN_NUM];
};
struct mlx5_timer {
struct cyclecounter cycles;
struct timecounter tc;
u32 nominal_c_mult;
unsigned long overflow_period;
};
struct mlx5_clock {
seqlock_t lock;
struct hwtstamp_config hwtstamp_config;
struct ptp_clock *ptp;
struct ptp_clock_info ptp_info;
struct mlx5_pps pps_info;
struct mlx5_timer timer;
bool shared;
};
static inline bool mlx5_is_real_time_rq(struct mlx5_core_dev *mdev)
{
u8 rq_ts_format_cap = MLX5_CAP_GEN(mdev, rq_ts_format);
@ -54,12 +83,14 @@ static inline bool mlx5_is_real_time_sq(struct mlx5_core_dev *mdev)
typedef ktime_t (*cqe_ts_to_ns)(struct mlx5_clock *, u64);
#if IS_ENABLED(CONFIG_PTP_1588_CLOCK)
void mlx5_init_clock(struct mlx5_core_dev *mdev);
int mlx5_init_clock(struct mlx5_core_dev *mdev);
void mlx5_cleanup_clock(struct mlx5_core_dev *mdev);
void mlx5_clock_load(struct mlx5_core_dev *mdev);
void mlx5_clock_unload(struct mlx5_core_dev *mdev);
static inline int mlx5_clock_get_ptp_index(struct mlx5_core_dev *mdev)
{
return mdev->clock.ptp ? ptp_clock_index(mdev->clock.ptp) : -1;
return mdev->clock->ptp ? ptp_clock_index(mdev->clock->ptp) : -1;
}
static inline ktime_t mlx5_timecounter_cyc2time(struct mlx5_clock *clock,
@ -87,8 +118,10 @@ static inline ktime_t mlx5_real_time_cyc2time(struct mlx5_clock *clock,
return ns_to_ktime(time);
}
#else
static inline void mlx5_init_clock(struct mlx5_core_dev *mdev) {}
static inline int mlx5_init_clock(struct mlx5_core_dev *mdev) { return 0; }
static inline void mlx5_cleanup_clock(struct mlx5_core_dev *mdev) {}
static inline void mlx5_clock_load(struct mlx5_core_dev *mdev) {}
static inline void mlx5_clock_unload(struct mlx5_core_dev *mdev) {}
static inline int mlx5_clock_get_ptp_index(struct mlx5_core_dev *mdev)
{
return -1;

View File

@ -11,6 +11,7 @@ enum mlx5_devcom_component {
MLX5_DEVCOM_MPV,
MLX5_DEVCOM_HCA_PORTS,
MLX5_DEVCOM_SD_GROUP,
MLX5_DEVCOM_SHARED_CLOCK,
MLX5_DEVCOM_NUM_COMPONENTS,
};

View File

@ -1038,7 +1038,11 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
mlx5_init_reserved_gids(dev);
mlx5_init_clock(dev);
err = mlx5_init_clock(dev);
if (err) {
mlx5_core_err(dev, "failed to initialize hardware clock\n");
goto err_tables_cleanup;
}
dev->vxlan = mlx5_vxlan_create(dev);
dev->geneve = mlx5_geneve_create(dev);
@ -1046,7 +1050,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
err = mlx5_init_rl_table(dev);
if (err) {
mlx5_core_err(dev, "Failed to init rate limiting\n");
goto err_tables_cleanup;
goto err_clock_cleanup;
}
err = mlx5_mpfs_init(dev);
@ -1123,10 +1127,11 @@ err_mpfs_cleanup:
mlx5_mpfs_cleanup(dev);
err_rl_cleanup:
mlx5_cleanup_rl_table(dev);
err_tables_cleanup:
err_clock_cleanup:
mlx5_geneve_destroy(dev->geneve);
mlx5_vxlan_destroy(dev->vxlan);
mlx5_cleanup_clock(dev);
err_tables_cleanup:
mlx5_cleanup_reserved_gids(dev);
mlx5_cq_debugfs_cleanup(dev);
mlx5_fw_reset_cleanup(dev);
@ -1359,6 +1364,8 @@ static int mlx5_load(struct mlx5_core_dev *dev)
goto err_eq_table;
}
mlx5_clock_load(dev);
err = mlx5_fw_tracer_init(dev->tracer);
if (err) {
mlx5_core_err(dev, "Failed to init FW tracer %d\n", err);
@ -1442,6 +1449,7 @@ err_fpga_start:
mlx5_hv_vhca_cleanup(dev->hv_vhca);
mlx5_fw_reset_events_stop(dev);
mlx5_fw_tracer_cleanup(dev->tracer);
mlx5_clock_unload(dev);
mlx5_eq_table_destroy(dev);
err_eq_table:
mlx5_irq_table_destroy(dev);
@ -1468,6 +1476,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev)
mlx5_hv_vhca_cleanup(dev->hv_vhca);
mlx5_fw_reset_events_stop(dev);
mlx5_fw_tracer_cleanup(dev->tracer);
mlx5_clock_unload(dev);
mlx5_eq_table_destroy(dev);
mlx5_irq_table_destroy(dev);
mlx5_pagealloc_stop(dev);

View File

@ -1105,6 +1105,9 @@ static const u32 mlx5e_ext_link_speed[MLX5E_EXT_LINK_MODES_NUMBER] = {
[MLX5E_200GAUI_2_200GBASE_CR2_KR2] = 200000,
[MLX5E_400GAUI_4_400GBASE_CR4_KR4] = 400000,
[MLX5E_800GAUI_8_800GBASE_CR8_KR8] = 800000,
[MLX5E_200GAUI_1_200GBASE_CR1_KR1] = 200000,
[MLX5E_400GAUI_2_400GBASE_CR2_KR2] = 400000,
[MLX5E_800GAUI_4_800GBASE_CR4_KR4] = 800000,
};
int mlx5_port_query_eth_proto(struct mlx5_core_dev *dev, u8 port, bool ext,

View File

@ -516,30 +516,6 @@ def_xa_destroy:
return NULL;
}
/* Assure synchronization of the device steering tables with updates made by SW
* insertion.
*/
int mlx5dr_domain_sync(struct mlx5dr_domain *dmn, u32 flags)
{
int ret = 0;
if (flags & MLX5DR_DOMAIN_SYNC_FLAGS_SW) {
mlx5dr_domain_lock(dmn);
ret = mlx5dr_send_ring_force_drain(dmn);
mlx5dr_domain_unlock(dmn);
if (ret) {
mlx5dr_err(dmn, "Force drain failed flags: %d, ret: %d\n",
flags, ret);
return ret;
}
}
if (flags & MLX5DR_DOMAIN_SYNC_FLAGS_HW)
ret = mlx5dr_cmd_sync_steering(dmn->mdev);
return ret;
}
int mlx5dr_domain_destroy(struct mlx5dr_domain *dmn)
{
if (WARN_ON_ONCE(refcount_read(&dmn->refcount) > 1))

View File

@ -1331,36 +1331,3 @@ void mlx5dr_send_ring_free(struct mlx5dr_domain *dmn,
kfree(send_ring->sync_buff);
kfree(send_ring);
}
int mlx5dr_send_ring_force_drain(struct mlx5dr_domain *dmn)
{
struct mlx5dr_send_ring *send_ring = dmn->send_ring;
struct postsend_info send_info = {};
u8 data[DR_STE_SIZE];
int num_of_sends_req;
int ret;
int i;
/* Sending this amount of requests makes sure we will get drain */
num_of_sends_req = send_ring->signal_th * TH_NUMS_TO_DRAIN / 2;
/* Send fake requests forcing the last to be signaled */
send_info.write.addr = (uintptr_t)data;
send_info.write.length = DR_STE_SIZE;
send_info.write.lkey = 0;
/* Using the sync_mr in order to write/read */
send_info.remote_addr = (uintptr_t)send_ring->sync_mr->addr;
send_info.rkey = send_ring->sync_mr->mkey;
for (i = 0; i < num_of_sends_req; i++) {
ret = dr_postsend_icm_data(dmn, &send_info);
if (ret)
return ret;
}
spin_lock(&send_ring->lock);
ret = dr_handle_pending_wc(dmn, send_ring);
spin_unlock(&send_ring->lock);
return ret;
}

View File

@ -1473,7 +1473,6 @@ struct mlx5dr_send_ring {
int mlx5dr_send_ring_alloc(struct mlx5dr_domain *dmn);
void mlx5dr_send_ring_free(struct mlx5dr_domain *dmn,
struct mlx5dr_send_ring *send_ring);
int mlx5dr_send_ring_force_drain(struct mlx5dr_domain *dmn);
int mlx5dr_send_postsend_ste(struct mlx5dr_domain *dmn,
struct mlx5dr_ste *ste,
u8 *data,

View File

@ -45,8 +45,6 @@ mlx5dr_domain_create(struct mlx5_core_dev *mdev, enum mlx5dr_domain_type type);
int mlx5dr_domain_destroy(struct mlx5dr_domain *domain);
int mlx5dr_domain_sync(struct mlx5dr_domain *domain, u32 flags);
void mlx5dr_domain_set_peer(struct mlx5dr_domain *dmn,
struct mlx5dr_domain *peer_dmn,
u16 peer_vhca_id);

View File

@ -754,9 +754,6 @@ void
mlxsw_sp_port_vlan_router_leave(struct mlxsw_sp_port_vlan *mlxsw_sp_port_vlan);
void mlxsw_sp_rif_destroy_by_dev(struct mlxsw_sp *mlxsw_sp,
struct net_device *dev);
bool mlxsw_sp_rif_exists(struct mlxsw_sp *mlxsw_sp,
const struct net_device *dev);
u16 mlxsw_sp_rif_vid(struct mlxsw_sp *mlxsw_sp, const struct net_device *dev);
u16 mlxsw_sp_router_port(const struct mlxsw_sp *mlxsw_sp);
int mlxsw_sp_router_nve_promote_decap(struct mlxsw_sp *mlxsw_sp, u32 ul_tb_id,
enum mlxsw_sp_l3proto ul_proto,

View File

@ -8184,41 +8184,6 @@ mlxsw_sp_rif_find_by_dev(const struct mlxsw_sp *mlxsw_sp,
return NULL;
}
bool mlxsw_sp_rif_exists(struct mlxsw_sp *mlxsw_sp,
const struct net_device *dev)
{
struct mlxsw_sp_rif *rif;
mutex_lock(&mlxsw_sp->router->lock);
rif = mlxsw_sp_rif_find_by_dev(mlxsw_sp, dev);
mutex_unlock(&mlxsw_sp->router->lock);
return rif;
}
u16 mlxsw_sp_rif_vid(struct mlxsw_sp *mlxsw_sp, const struct net_device *dev)
{
struct mlxsw_sp_rif *rif;
u16 vid = 0;
mutex_lock(&mlxsw_sp->router->lock);
rif = mlxsw_sp_rif_find_by_dev(mlxsw_sp, dev);
if (!rif)
goto out;
/* We only return the VID for VLAN RIFs. Otherwise we return an
* invalid value (0).
*/
if (rif->ops->type != MLXSW_SP_RIF_TYPE_VLAN)
goto out;
vid = mlxsw_sp_fid_8021q_vid(rif->fid);
out:
mutex_unlock(&mlxsw_sp->router->lock);
return vid;
}
static int mlxsw_sp_router_rif_disable(struct mlxsw_sp *mlxsw_sp, u16 rif)
{
char ritr_pl[MLXSW_REG_RITR_LEN];
@ -8417,19 +8382,6 @@ u16 mlxsw_sp_ipip_lb_rif_index(const struct mlxsw_sp_rif_ipip_lb *lb_rif)
return lb_rif->common.rif_index;
}
u16 mlxsw_sp_ipip_lb_ul_vr_id(const struct mlxsw_sp_rif_ipip_lb *lb_rif)
{
struct net_device *dev = mlxsw_sp_rif_dev(&lb_rif->common);
u32 ul_tb_id = mlxsw_sp_ipip_dev_ul_tb_id(dev);
struct mlxsw_sp_vr *ul_vr;
ul_vr = mlxsw_sp_vr_get(lb_rif->common.mlxsw_sp, ul_tb_id, NULL);
if (WARN_ON(IS_ERR(ul_vr)))
return 0;
return ul_vr->id;
}
u16 mlxsw_sp_ipip_lb_ul_rif_id(const struct mlxsw_sp_rif_ipip_lb *lb_rif)
{
return lb_rif->ul_rif_id;

View File

@ -90,7 +90,6 @@ struct mlxsw_sp_ipip_entry;
struct mlxsw_sp_rif *mlxsw_sp_rif_by_index(const struct mlxsw_sp *mlxsw_sp,
u16 rif_index);
u16 mlxsw_sp_ipip_lb_rif_index(const struct mlxsw_sp_rif_ipip_lb *rif);
u16 mlxsw_sp_ipip_lb_ul_vr_id(const struct mlxsw_sp_rif_ipip_lb *rif);
u16 mlxsw_sp_ipip_lb_ul_rif_id(const struct mlxsw_sp_rif_ipip_lb *lb_rif);
u32 mlxsw_sp_ipip_dev_ul_tb_id(const struct net_device *ol_dev);
int mlxsw_sp_rif_dev_ifindex(const struct mlxsw_sp_rif *rif);

View File

@ -10,6 +10,40 @@
static struct dentry *fbnic_dbg_root;
static void fbnic_dbg_desc_break(struct seq_file *s, int i)
{
while (i--)
seq_putc(s, '-');
seq_putc(s, '\n');
}
static int fbnic_dbg_mac_addr_show(struct seq_file *s, void *v)
{
struct fbnic_dev *fbd = s->private;
char hdr[80];
int i;
/* Generate Header */
snprintf(hdr, sizeof(hdr), "%3s %s %-17s %s\n",
"Idx", "S", "TCAM Bitmap", "Addr/Mask");
seq_puts(s, hdr);
fbnic_dbg_desc_break(s, strnlen(hdr, sizeof(hdr)));
for (i = 0; i < FBNIC_RPC_TCAM_MACDA_NUM_ENTRIES; i++) {
struct fbnic_mac_addr *mac_addr = &fbd->mac_addr[i];
seq_printf(s, "%02d %d %64pb %pm\n",
i, mac_addr->state, mac_addr->act_tcam,
mac_addr->value.addr8);
seq_printf(s, " %pm\n",
mac_addr->mask.addr8);
}
return 0;
}
DEFINE_SHOW_ATTRIBUTE(fbnic_dbg_mac_addr);
static int fbnic_dbg_pcie_stats_show(struct seq_file *s, void *v)
{
struct fbnic_dev *fbd = s->private;
@ -48,6 +82,8 @@ void fbnic_dbg_fbd_init(struct fbnic_dev *fbd)
fbd->dbg_fbd = debugfs_create_dir(name, fbnic_dbg_root);
debugfs_create_file("pcie_stats", 0400, fbd->dbg_fbd, fbd,
&fbnic_dbg_pcie_stats_fops);
debugfs_create_file("mac_addr", 0400, fbd->dbg_fbd, fbd,
&fbnic_dbg_mac_addr_fops);
}
void fbnic_dbg_fbd_exit(struct fbnic_dev *fbd)

View File

@ -628,6 +628,8 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
fbnic_rss_key_fill(fbn->rss_key);
fbnic_rss_init_en_mask(fbn);
netdev->priv_flags |= IFF_UNICAST_FLT;
netdev->features |=
NETIF_F_RXHASH |
NETIF_F_SG |

View File

@ -3033,7 +3033,7 @@ static void qed_iov_vf_mbx_vport_update(struct qed_hwfn *p_hwfn,
u16 length;
int rc;
/* Valiate PF can send such a request */
/* Validate PF can send such a request */
if (!vf->vport_instance) {
DP_VERBOSE(p_hwfn,
QED_MSG_IOV,
@ -3312,7 +3312,7 @@ static void qed_iov_vf_mbx_ucast_filter(struct qed_hwfn *p_hwfn,
goto out;
}
/* Determine if the unicast filtering is acceptible by PF */
/* Determine if the unicast filtering is acceptable by PF */
if ((p_bulletin->valid_bitmap & BIT(VLAN_ADDR_FORCED)) &&
(params.type == QED_FILTER_VLAN ||
params.type == QED_FILTER_MAC_VLAN)) {
@ -3729,7 +3729,7 @@ qed_iov_execute_vf_flr_cleanup(struct qed_hwfn *p_hwfn,
rc = qed_iov_enable_vf_access(p_hwfn, p_ptt, p_vf);
if (rc) {
DP_ERR(p_hwfn, "Failed to re-enable VF[%d] acces\n",
DP_ERR(p_hwfn, "Failed to re-enable VF[%d] access\n",
vfid);
return rc;
}
@ -4480,7 +4480,7 @@ int qed_sriov_disable(struct qed_dev *cdev, bool pci_enabled)
struct qed_ptt *ptt = qed_ptt_acquire(hwfn);
/* Failure to acquire the ptt in 100g creates an odd error
* where the first engine has already relased IOV.
* where the first engine has already released IOV.
*/
if (!ptt) {
DP_ERR(hwfn, "Failed to acquire ptt\n");

View File

@ -114,7 +114,8 @@ config R8169
will be called r8169. This is recommended.
config R8169_LEDS
def_bool R8169 && LEDS_TRIGGER_NETDEV
bool "Support for controlling the NIC LEDs"
depends on R8169 && LEDS_TRIGGER_NETDEV
depends on !(R8169=y && LEDS_CLASS=m)
help
Optional support for controlling the NIC LED's with the netdev

View File

@ -5222,6 +5222,7 @@ static int r8169_mdio_register(struct rtl8169_private *tp)
new_bus->priv = tp;
new_bus->parent = &pdev->dev;
new_bus->irq[0] = PHY_MAC_INTERRUPT;
new_bus->phy_mask = GENMASK(31, 1);
snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x-%x",
pci_domain_nr(pdev->bus), pci_dev_id(pdev));

View File

@ -13,7 +13,7 @@
*/
const char *phy_speed_to_str(int speed)
{
BUILD_BUG_ON_MSG(__ETHTOOL_LINK_MODE_MASK_NBITS != 103,
BUILD_BUG_ON_MSG(__ETHTOOL_LINK_MODE_MASK_NBITS != 121,
"Enum ethtool_link_mode_bit_indices and phylib are out of sync. "
"If a speed or mode has been added please update phy_speed_to_str "
"and the PHY settings array.\n");
@ -169,6 +169,12 @@ static const struct phy_setting settings[] = {
PHY_SETTING( 800000, FULL, 800000baseDR8_2_Full ),
PHY_SETTING( 800000, FULL, 800000baseSR8_Full ),
PHY_SETTING( 800000, FULL, 800000baseVR8_Full ),
PHY_SETTING( 800000, FULL, 800000baseCR4_Full ),
PHY_SETTING( 800000, FULL, 800000baseKR4_Full ),
PHY_SETTING( 800000, FULL, 800000baseDR4_Full ),
PHY_SETTING( 800000, FULL, 800000baseDR4_2_Full ),
PHY_SETTING( 800000, FULL, 800000baseSR4_Full ),
PHY_SETTING( 800000, FULL, 800000baseVR4_Full ),
/* 400G */
PHY_SETTING( 400000, FULL, 400000baseCR8_Full ),
PHY_SETTING( 400000, FULL, 400000baseKR8_Full ),
@ -180,6 +186,12 @@ static const struct phy_setting settings[] = {
PHY_SETTING( 400000, FULL, 400000baseLR4_ER4_FR4_Full ),
PHY_SETTING( 400000, FULL, 400000baseDR4_Full ),
PHY_SETTING( 400000, FULL, 400000baseSR4_Full ),
PHY_SETTING( 400000, FULL, 400000baseCR2_Full ),
PHY_SETTING( 400000, FULL, 400000baseKR2_Full ),
PHY_SETTING( 400000, FULL, 400000baseDR2_Full ),
PHY_SETTING( 400000, FULL, 400000baseDR2_2_Full ),
PHY_SETTING( 400000, FULL, 400000baseSR2_Full ),
PHY_SETTING( 400000, FULL, 400000baseVR2_Full ),
/* 200G */
PHY_SETTING( 200000, FULL, 200000baseCR4_Full ),
PHY_SETTING( 200000, FULL, 200000baseKR4_Full ),
@ -191,6 +203,12 @@ static const struct phy_setting settings[] = {
PHY_SETTING( 200000, FULL, 200000baseLR2_ER2_FR2_Full ),
PHY_SETTING( 200000, FULL, 200000baseDR2_Full ),
PHY_SETTING( 200000, FULL, 200000baseSR2_Full ),
PHY_SETTING( 200000, FULL, 200000baseCR_Full ),
PHY_SETTING( 200000, FULL, 200000baseKR_Full ),
PHY_SETTING( 200000, FULL, 200000baseDR_Full ),
PHY_SETTING( 200000, FULL, 200000baseDR_2_Full ),
PHY_SETTING( 200000, FULL, 200000baseSR_Full ),
PHY_SETTING( 200000, FULL, 200000baseVR_Full ),
/* 100G */
PHY_SETTING( 100000, FULL, 100000baseCR4_Full ),
PHY_SETTING( 100000, FULL, 100000baseKR4_Full ),

View File

@ -4,8 +4,12 @@ config REALTEK_PHY
help
Currently supports RTL821x/RTL822x and fast ethernet PHYs
if REALTEK_PHY
config REALTEK_PHY_HWMON
def_bool REALTEK_PHY && HWMON
depends on !(REALTEK_PHY=y && HWMON=m)
bool "HWMON support for Realtek PHYs"
depends on HWMON && !(REALTEK_PHY=y && HWMON=m)
help
Optional hwmon support for the temperature sensor
endif # REALTEK_PHY

View File

@ -13,6 +13,7 @@
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/clk.h>
#include <linux/string_choices.h>
#include "realtek.h"
@ -422,11 +423,11 @@ static int rtl8211f_config_init(struct phy_device *phydev)
} else if (ret) {
dev_dbg(dev,
"%s 2ns TX delay (and changing the value from pin-strapping RXD1 or the bootloader)\n",
val_txdly ? "Enabling" : "Disabling");
str_enable_disable(val_txdly));
} else {
dev_dbg(dev,
"2ns TX delay was already %s (by pin-strapping RXD1 or bootloader configuration)\n",
val_txdly ? "enabled" : "disabled");
str_enabled_disabled(val_txdly));
}
ret = phy_modify_paged_changed(phydev, 0xd08, 0x15, RTL8211F_RX_DELAY,
@ -437,11 +438,11 @@ static int rtl8211f_config_init(struct phy_device *phydev)
} else if (ret) {
dev_dbg(dev,
"%s 2ns RX delay (and changing the value from pin-strapping RXD0 or the bootloader)\n",
val_rxdly ? "Enabling" : "Disabling");
str_enable_disable(val_rxdly));
} else {
dev_dbg(dev,
"2ns RX delay was already %s (by pin-strapping RXD0 or bootloader configuration)\n",
val_rxdly ? "enabled" : "disabled");
str_enabled_disabled(val_rxdly));
}
if (priv->has_phycr2) {

View File

@ -227,9 +227,9 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
be32_to_cpu(fdb->vni)))
goto nla_put_failure;
ci.ndm_used = jiffies_to_clock_t(now - fdb->used);
ci.ndm_used = jiffies_to_clock_t(now - READ_ONCE(fdb->used));
ci.ndm_confirmed = 0;
ci.ndm_updated = jiffies_to_clock_t(now - fdb->updated);
ci.ndm_updated = jiffies_to_clock_t(now - READ_ONCE(fdb->updated));
ci.ndm_refcnt = 0;
if (nla_put(skb, NDA_CACHEINFO, sizeof(ci), &ci))
@ -434,8 +434,12 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
struct vxlan_fdb *f;
f = __vxlan_find_mac(vxlan, mac, vni);
if (f && f->used != jiffies)
f->used = jiffies;
if (f) {
unsigned long now = jiffies;
if (READ_ONCE(f->used) != now)
WRITE_ONCE(f->used, now);
}
return f;
}
@ -1009,12 +1013,10 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
!(f->flags & NTF_VXLAN_ADDED_BY_USER)) {
if (f->state != state) {
f->state = state;
f->updated = jiffies;
notify = 1;
}
if (f->flags != fdb_flags) {
f->flags = fdb_flags;
f->updated = jiffies;
notify = 1;
}
}
@ -1048,12 +1050,13 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
}
if (ndm_flags & NTF_USE)
f->used = jiffies;
WRITE_ONCE(f->updated, jiffies);
if (notify) {
if (rd == NULL)
rd = first_remote_rtnl(f);
WRITE_ONCE(f->updated, jiffies);
err = vxlan_fdb_notify(vxlan, f, rd, RTM_NEWNEIGH,
swdev_notify, extack);
if (err)
@ -1292,7 +1295,7 @@ int __vxlan_fdb_delete(struct vxlan_dev *vxlan,
struct vxlan_fdb *f;
int err = -ENOENT;
f = vxlan_find_mac(vxlan, addr, src_vni);
f = __vxlan_find_mac(vxlan, addr, src_vni);
if (!f)
return err;
@ -1459,9 +1462,13 @@ static enum skb_drop_reason vxlan_snoop(struct net_device *dev,
ifindex = src_ifindex;
#endif
f = vxlan_find_mac(vxlan, src_mac, vni);
f = __vxlan_find_mac(vxlan, src_mac, vni);
if (likely(f)) {
struct vxlan_rdst *rdst = first_remote_rcu(f);
unsigned long now = jiffies;
if (READ_ONCE(f->updated) != now)
WRITE_ONCE(f->updated, now);
if (likely(vxlan_addr_equal(&rdst->remote_ip, src_ip) &&
rdst->remote_ifindex == ifindex))
@ -1481,7 +1488,6 @@ static enum skb_drop_reason vxlan_snoop(struct net_device *dev,
src_mac, &rdst->remote_ip.sa, &src_ip->sa);
rdst->remote_ip = *src_ip;
f->updated = jiffies;
vxlan_fdb_notify(vxlan, f, rdst, RTM_NEWNEIGH, true, NULL);
} else {
u32 hash_index = fdb_head_index(vxlan, src_mac, vni);
@ -2852,7 +2858,7 @@ static void vxlan_cleanup(struct timer_list *t)
if (f->flags & NTF_EXT_LEARNED)
continue;
timeout = f->used + vxlan->cfg.age_interval * HZ;
timeout = READ_ONCE(f->updated) + vxlan->cfg.age_interval * HZ;
if (time_before_eq(timeout, jiffies)) {
netdev_dbg(vxlan->dev,
"garbage collect %pM\n",
@ -4768,7 +4774,7 @@ vxlan_fdb_offloaded_set(struct net_device *dev,
spin_lock_bh(&vxlan->hash_lock[hash_index]);
f = vxlan_find_mac(vxlan, fdb_info->eth_addr, fdb_info->vni);
f = __vxlan_find_mac(vxlan, fdb_info->eth_addr, fdb_info->vni);
if (!f)
goto out;
@ -4824,7 +4830,7 @@ vxlan_fdb_external_learn_del(struct net_device *dev,
hash_index = fdb_head_index(vxlan, fdb_info->eth_addr, fdb_info->vni);
spin_lock_bh(&vxlan->hash_lock[hash_index]);
f = vxlan_find_mac(vxlan, fdb_info->eth_addr, fdb_info->vni);
f = __vxlan_find_mac(vxlan, fdb_info->eth_addr, fdb_info->vni);
if (!f)
err = -ENOENT;
else if (f->flags & NTF_EXT_LEARNED)

View File

@ -2,15 +2,6 @@
menu "S/390 network device drivers"
depends on NETDEVICES && S390
config LCS
def_tristate m
prompt "Lan Channel Station Interface"
depends on CCW && NETDEVICES && ETHERNET
help
Select this option if you want to use LCS networking on IBM System z.
To compile as a module, choose M. The module name is lcs.
If you do not use LCS, choose N.
config CTCM
def_tristate m
prompt "CTC and MPC SNA device support"
@ -98,7 +89,7 @@ config QETH_OSX
config CCWGROUP
tristate
default (LCS || CTCM || QETH || SMC)
default (CTCM || QETH || SMC)
config ISM
tristate "Support for ISM vPCI Adapter"

View File

@ -8,7 +8,6 @@ obj-$(CONFIG_CTCM) += ctcm.o fsm.o
obj-$(CONFIG_NETIUCV) += netiucv.o fsm.o
obj-$(CONFIG_SMSGIUCV) += smsgiucv.o
obj-$(CONFIG_SMSGIUCV_EVENT) += smsgiucv_app.o
obj-$(CONFIG_LCS) += lcs.o
qeth-y += qeth_core_sys.o qeth_core_main.o qeth_core_mpc.o qeth_ethtool.o
obj-$(CONFIG_QETH) += qeth.o
qeth_l2-y += qeth_l2_main.o qeth_l2_sys.o

File diff suppressed because it is too large Load Diff

View File

@ -1,342 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*lcs.h*/
#include <linux/interrupt.h>
#include <linux/netdevice.h>
#include <linux/skbuff.h>
#include <linux/workqueue.h>
#include <linux/refcount.h>
#include <asm/ccwdev.h>
#define LCS_DBF_TEXT(level, name, text) \
do { \
debug_text_event(lcs_dbf_##name, level, text); \
} while (0)
#define LCS_DBF_HEX(level,name,addr,len) \
do { \
debug_event(lcs_dbf_##name,level,(void*)(addr),len); \
} while (0)
#define LCS_DBF_TEXT_(level,name,text...) \
do { \
if (debug_level_enabled(lcs_dbf_##name, level)) { \
scnprintf(debug_buffer, sizeof(debug_buffer), text); \
debug_text_event(lcs_dbf_##name, level, debug_buffer); \
} \
} while (0)
/**
* sysfs related stuff
*/
#define CARD_FROM_DEV(cdev) \
(struct lcs_card *) dev_get_drvdata( \
&((struct ccwgroup_device *)dev_get_drvdata(&cdev->dev))->dev);
/**
* Enum for classifying detected devices.
*/
enum lcs_channel_types {
/* Device is not a channel */
lcs_channel_type_none,
/* Device is a 2216 channel */
lcs_channel_type_parallel,
/* Device is a 2216 channel */
lcs_channel_type_2216,
/* Device is a OSA2 card */
lcs_channel_type_osa2
};
/**
* CCW commands used in this driver
*/
#define LCS_CCW_WRITE 0x01
#define LCS_CCW_READ 0x02
#define LCS_CCW_TRANSFER 0x08
/**
* LCS device status primitives
*/
#define LCS_CMD_STARTLAN 0x01
#define LCS_CMD_STOPLAN 0x02
#define LCS_CMD_LANSTAT 0x04
#define LCS_CMD_STARTUP 0x07
#define LCS_CMD_SHUTDOWN 0x08
#define LCS_CMD_QIPASSIST 0xb2
#define LCS_CMD_SETIPM 0xb4
#define LCS_CMD_DELIPM 0xb5
#define LCS_INITIATOR_TCPIP 0x00
#define LCS_INITIATOR_LGW 0x01
#define LCS_STD_CMD_SIZE 16
#define LCS_MULTICAST_CMD_SIZE 404
/**
* LCS IPASSIST MASKS,only used when multicast is switched on
*/
/* Not supported by LCS */
#define LCS_IPASS_ARP_PROCESSING 0x0001
#define LCS_IPASS_IN_CHECKSUM_SUPPORT 0x0002
#define LCS_IPASS_OUT_CHECKSUM_SUPPORT 0x0004
#define LCS_IPASS_IP_FRAG_REASSEMBLY 0x0008
#define LCS_IPASS_IP_FILTERING 0x0010
/* Supported by lcs 3172 */
#define LCS_IPASS_IPV6_SUPPORT 0x0020
#define LCS_IPASS_MULTICAST_SUPPORT 0x0040
/**
* LCS sense byte definitions
*/
#define LCS_SENSE_BYTE_0 0
#define LCS_SENSE_BYTE_1 1
#define LCS_SENSE_BYTE_2 2
#define LCS_SENSE_BYTE_3 3
#define LCS_SENSE_INTERFACE_DISCONNECT 0x01
#define LCS_SENSE_EQUIPMENT_CHECK 0x10
#define LCS_SENSE_BUS_OUT_CHECK 0x20
#define LCS_SENSE_INTERVENTION_REQUIRED 0x40
#define LCS_SENSE_CMD_REJECT 0x80
#define LCS_SENSE_RESETTING_EVENT 0x80
#define LCS_SENSE_DEVICE_ONLINE 0x20
/**
* LCS packet type definitions
*/
#define LCS_FRAME_TYPE_CONTROL 0
#define LCS_FRAME_TYPE_ENET 1
#define LCS_FRAME_TYPE_TR 2
#define LCS_FRAME_TYPE_FDDI 7
#define LCS_FRAME_TYPE_AUTO -1
/**
* some more definitions,we will sort them later
*/
#define LCS_ILLEGAL_OFFSET 0xffff
#define LCS_IOBUFFERSIZE 0x5000
#define LCS_NUM_BUFFS 32 /* needs to be power of 2 */
#define LCS_MAC_LENGTH 6
#define LCS_INVALID_PORT_NO -1
#define LCS_LANCMD_TIMEOUT_DEFAULT 5
/**
* Multicast state
*/
#define LCS_IPM_STATE_SET_REQUIRED 0
#define LCS_IPM_STATE_DEL_REQUIRED 1
#define LCS_IPM_STATE_ON_CARD 2
/**
* LCS IP Assist declarations
* seems to be only used for multicast
*/
#define LCS_IPASS_ARP_PROCESSING 0x0001
#define LCS_IPASS_INBOUND_CSUM_SUPP 0x0002
#define LCS_IPASS_OUTBOUND_CSUM_SUPP 0x0004
#define LCS_IPASS_IP_FRAG_REASSEMBLY 0x0008
#define LCS_IPASS_IP_FILTERING 0x0010
#define LCS_IPASS_IPV6_SUPPORT 0x0020
#define LCS_IPASS_MULTICAST_SUPPORT 0x0040
/**
* LCS Buffer states
*/
enum lcs_buffer_states {
LCS_BUF_STATE_EMPTY, /* buffer is empty */
LCS_BUF_STATE_LOCKED, /* buffer is locked, don't touch */
LCS_BUF_STATE_READY, /* buffer is ready for read/write */
LCS_BUF_STATE_PROCESSED,
};
/**
* LCS Channel State Machine declarations
*/
enum lcs_channel_states {
LCS_CH_STATE_INIT,
LCS_CH_STATE_HALTED,
LCS_CH_STATE_STOPPED,
LCS_CH_STATE_RUNNING,
LCS_CH_STATE_SUSPENDED,
LCS_CH_STATE_CLEARED,
LCS_CH_STATE_ERROR,
};
/**
* LCS device state machine
*/
enum lcs_dev_states {
DEV_STATE_DOWN,
DEV_STATE_UP,
DEV_STATE_RECOVER,
};
enum lcs_threads {
LCS_SET_MC_THREAD = 1,
LCS_RECOVERY_THREAD = 2,
};
/**
* LCS struct declarations
*/
struct lcs_header {
__u16 offset;
__u8 type;
__u8 slot;
} __attribute__ ((packed));
struct lcs_ip_mac_pair {
__be32 ip_addr;
__u8 mac_addr[LCS_MAC_LENGTH];
__u8 reserved[2];
} __attribute__ ((packed));
struct lcs_ipm_list {
struct list_head list;
struct lcs_ip_mac_pair ipm;
__u8 ipm_state;
};
struct lcs_cmd {
__u16 offset;
__u8 type;
__u8 slot;
__u8 cmd_code;
__u8 initiator;
__u16 sequence_no;
__u16 return_code;
union {
struct {
__u8 lan_type;
__u8 portno;
__u16 parameter_count;
__u8 operator_flags[3];
__u8 reserved[3];
} lcs_std_cmd;
struct {
__u16 unused1;
__u16 buff_size;
__u8 unused2[6];
} lcs_startup;
struct {
__u8 lan_type;
__u8 portno;
__u8 unused[10];
__u8 mac_addr[LCS_MAC_LENGTH];
__u32 num_packets_deblocked;
__u32 num_packets_blocked;
__u32 num_packets_tx_on_lan;
__u32 num_tx_errors_detected;
__u32 num_tx_packets_disgarded;
__u32 num_packets_rx_from_lan;
__u32 num_rx_errors_detected;
__u32 num_rx_discarded_nobuffs_avail;
__u32 num_rx_packets_too_large;
} lcs_lanstat_cmd;
#ifdef CONFIG_IP_MULTICAST
struct {
__u8 lan_type;
__u8 portno;
__u16 num_ip_pairs;
__u16 ip_assists_supported;
__u16 ip_assists_enabled;
__u16 version;
struct {
struct lcs_ip_mac_pair
ip_mac_pair[32];
__u32 response_data;
} lcs_ipass_ctlmsg __attribute ((packed));
} lcs_qipassist __attribute__ ((packed));
#endif /*CONFIG_IP_MULTICAST */
} cmd __attribute__ ((packed));
} __attribute__ ((packed));
/**
* Forward declarations.
*/
struct lcs_card;
struct lcs_channel;
/**
* Definition of an lcs buffer.
*/
struct lcs_buffer {
enum lcs_buffer_states state;
void *data;
int count;
/* Callback for completion notification. */
void (*callback)(struct lcs_channel *, struct lcs_buffer *);
};
struct lcs_reply {
struct list_head list;
__u16 sequence_no;
refcount_t refcnt;
/* Callback for completion notification. */
void (*callback)(struct lcs_card *, struct lcs_cmd *);
wait_queue_head_t wait_q;
struct lcs_card *card;
struct timer_list timer;
int received;
int rc;
};
/**
* Definition of an lcs channel
*/
struct lcs_channel {
enum lcs_channel_states state;
struct ccw_device *ccwdev;
struct ccw1 ccws[LCS_NUM_BUFFS + 1];
wait_queue_head_t wait_q;
struct tasklet_struct irq_tasklet;
struct lcs_buffer iob[LCS_NUM_BUFFS];
int io_idx;
int buf_idx;
};
/**
* definition of the lcs card
*/
struct lcs_card {
spinlock_t lock;
spinlock_t ipm_lock;
enum lcs_dev_states state;
struct net_device *dev;
struct net_device_stats stats;
__be16 (*lan_type_trans)(struct sk_buff *skb,
struct net_device *dev);
struct ccwgroup_device *gdev;
struct lcs_channel read;
struct lcs_channel write;
struct lcs_buffer *tx_buffer;
int tx_emitted;
struct list_head lancmd_waiters;
int lancmd_timeout;
struct work_struct kernel_thread_starter;
spinlock_t mask_lock;
unsigned long thread_start_mask;
unsigned long thread_running_mask;
unsigned long thread_allowed_mask;
wait_queue_head_t wait_q;
#ifdef CONFIG_IP_MULTICAST
struct list_head ipm_list;
#endif
__u8 mac[LCS_MAC_LENGTH];
__u16 ip_assists_supported;
__u16 ip_assists_enabled;
__s8 lan_type;
__u32 pkt_seq;
__u16 sequence_no;
__s16 portno;
/* Some info copied from probeinfo */
u8 device_forced;
u8 max_port_no;
u8 hint_port_no;
s16 port_protocol_no;
} __attribute__ ((aligned(8)));

View File

@ -40,6 +40,8 @@ enum io_uring_cmd_flags {
IO_URING_F_TASK_DEAD = (1 << 13),
};
struct io_zcrx_ifq;
struct io_wq_work_node {
struct io_wq_work_node *next;
};
@ -384,6 +386,8 @@ struct io_ring_ctx {
struct wait_queue_head poll_wq;
struct io_restriction restrictions;
struct io_zcrx_ifq *ifq;
u32 pers_next;
struct xarray personalities;
@ -436,6 +440,8 @@ struct io_ring_ctx {
struct io_mapped_region ring_region;
/* used for optimised request parameter and wait argument passing */
struct io_mapped_region param_region;
/* just one zcrx per ring for now, will move to io_zcrx_ifq eventually */
struct io_mapped_region zcrx_region;
};
/*

View File

@ -1415,7 +1415,6 @@ int mlx4_get_is_vlan_offload_disabled(struct mlx4_dev *dev, u8 port,
bool *vlan_offload_disabled);
void mlx4_handle_eth_header_mcast_prio(struct mlx4_net_trans_rule_hw_ctrl *ctrl,
struct _rule_hw *eth_header);
int mlx4_find_cached_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *idx);
int mlx4_find_cached_vlan(struct mlx4_dev *dev, u8 port, u16 vid, int *idx);
int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index);
void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, u16 vlan);

View File

@ -54,7 +54,6 @@
#include <linux/mlx5/doorbell.h>
#include <linux/mlx5/eq.h>
#include <linux/timecounter.h>
#include <linux/ptp_clock_kernel.h>
#include <net/devlink.h>
#define MLX5_ADEV_NAME "mlx5_core"
@ -679,33 +678,8 @@ struct mlx5_rsvd_gids {
struct ida ida;
};
#define MAX_PIN_NUM 8
struct mlx5_pps {
u8 pin_caps[MAX_PIN_NUM];
struct work_struct out_work;
u64 start[MAX_PIN_NUM];
u8 enabled;
u64 min_npps_period;
u64 min_out_pulse_duration_ns;
};
struct mlx5_timer {
struct cyclecounter cycles;
struct timecounter tc;
u32 nominal_c_mult;
unsigned long overflow_period;
};
struct mlx5_clock {
struct mlx5_nb pps_nb;
seqlock_t lock;
struct hwtstamp_config hwtstamp_config;
struct ptp_clock *ptp;
struct ptp_clock_info ptp_info;
struct mlx5_pps pps_info;
struct mlx5_timer timer;
};
struct mlx5_clock;
struct mlx5_clock_dev_state;
struct mlx5_dm;
struct mlx5_fw_tracer;
struct mlx5_vxlan;
@ -789,7 +763,8 @@ struct mlx5_core_dev {
#ifdef CONFIG_MLX5_FPGA
struct mlx5_fpga_device *fpga;
#endif
struct mlx5_clock clock;
struct mlx5_clock *clock;
struct mlx5_clock_dev_state *clock_state;
struct mlx5_ib_clock_info *clock_info;
struct mlx5_fw_tracer *tracer;
struct mlx5_rsc_dump *rsc_dump;

View File

@ -115,9 +115,12 @@ enum mlx5e_ext_link_mode {
MLX5E_100GAUI_1_100GBASE_CR_KR = 11,
MLX5E_200GAUI_4_200GBASE_CR4_KR4 = 12,
MLX5E_200GAUI_2_200GBASE_CR2_KR2 = 13,
MLX5E_200GAUI_1_200GBASE_CR1_KR1 = 14,
MLX5E_400GAUI_8_400GBASE_CR8 = 15,
MLX5E_400GAUI_4_400GBASE_CR4_KR4 = 16,
MLX5E_400GAUI_2_400GBASE_CR2_KR2 = 17,
MLX5E_800GAUI_8_800GBASE_CR8_KR8 = 19,
MLX5E_800GAUI_4_800GBASE_CR4_KR4 = 20,
MLX5E_EXT_LINK_MODES_NUMBER,
};

View File

@ -658,6 +658,7 @@ struct netdev_queue {
struct Qdisc __rcu *qdisc_sleeping;
#ifdef CONFIG_SYSFS
struct kobject kobj;
const struct attribute_group **groups;
#endif
unsigned long tx_maxrate;
/*

View File

@ -43,6 +43,7 @@ extern void rtnl_lock(void);
extern void rtnl_unlock(void);
extern int rtnl_trylock(void);
extern int rtnl_is_locked(void);
extern int rtnl_lock_interruptible(void);
extern int rtnl_lock_killable(void);
extern bool refcount_dec_and_rtnl_lock(refcount_t *r);

View File

@ -16,6 +16,7 @@ struct netdev_rx_queue {
struct rps_dev_flow_table __rcu *rps_flow_table;
#endif
struct kobject kobj;
const struct attribute_group **groups;
struct net_device *dev;
netdevice_tracker dev_tracker;

View File

@ -24,11 +24,20 @@ struct net_iov {
unsigned long __unused_padding;
unsigned long pp_magic;
struct page_pool *pp;
struct dmabuf_genpool_chunk_owner *owner;
struct net_iov_area *owner;
unsigned long dma_addr;
atomic_long_t pp_ref_count;
};
struct net_iov_area {
/* Array of net_iovs for this area. */
struct net_iov *niovs;
size_t num_niovs;
/* Offset into the dma-buf where this chunk starts. */
unsigned long base_virtual;
};
/* These fields in struct page are used by the page_pool and net stack:
*
* struct {
@ -54,6 +63,16 @@ NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
#undef NET_IOV_ASSERT_OFFSET
static inline struct net_iov_area *net_iov_owner(const struct net_iov *niov)
{
return niov->owner;
}
static inline unsigned int net_iov_idx(const struct net_iov *niov)
{
return niov - net_iov_owner(niov)->niovs;
}
/* netmem */
/**

View File

@ -0,0 +1,45 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _NET_PAGE_POOL_MEMORY_PROVIDER_H
#define _NET_PAGE_POOL_MEMORY_PROVIDER_H
#include <net/netmem.h>
#include <net/page_pool/types.h>
struct netdev_rx_queue;
struct sk_buff;
struct memory_provider_ops {
netmem_ref (*alloc_netmems)(struct page_pool *pool, gfp_t gfp);
bool (*release_netmem)(struct page_pool *pool, netmem_ref netmem);
int (*init)(struct page_pool *pool);
void (*destroy)(struct page_pool *pool);
int (*nl_fill)(void *mp_priv, struct sk_buff *rsp,
struct netdev_rx_queue *rxq);
void (*uninstall)(void *mp_priv, struct netdev_rx_queue *rxq);
};
bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr);
void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov);
void net_mp_niov_clear_page_pool(struct net_iov *niov);
int net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
struct pp_memory_provider_params *p);
void net_mp_close_rxq(struct net_device *dev, unsigned ifq_idx,
struct pp_memory_provider_params *old_p);
/**
* net_mp_netmem_place_in_cache() - give a netmem to a page pool
* @pool: the page pool to place the netmem into
* @netmem: netmem to give
*
* Push an accounted netmem into the page pool's allocation cache. The caller
* must ensure that there is space in the cache. It should only be called off
* the mp_ops->alloc_netmems() path.
*/
static inline void net_mp_netmem_place_in_cache(struct page_pool *pool,
netmem_ref netmem)
{
pool->alloc.cache[pool->alloc.count++] = netmem;
}
#endif

View File

@ -152,8 +152,11 @@ struct page_pool_stats {
*/
#define PAGE_POOL_FRAG_GROUP_ALIGN (4 * sizeof(long))
struct memory_provider_ops;
struct pp_memory_provider_params {
void *mp_priv;
const struct memory_provider_ops *mp_ops;
};
struct page_pool {
@ -216,6 +219,7 @@ struct page_pool {
struct ptr_ring ring;
void *mp_priv;
const struct memory_provider_ops *mp_ops;
#ifdef CONFIG_PAGE_POOL_STATS
/* recycle stats are per-cpu to avoid locking */

View File

@ -2059,6 +2059,24 @@ enum ethtool_link_mode_bit_indices {
ETHTOOL_LINK_MODE_10baseT1S_Half_BIT = 100,
ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT = 101,
ETHTOOL_LINK_MODE_10baseT1BRR_Full_BIT = 102,
ETHTOOL_LINK_MODE_200000baseCR_Full_BIT = 103,
ETHTOOL_LINK_MODE_200000baseKR_Full_BIT = 104,
ETHTOOL_LINK_MODE_200000baseDR_Full_BIT = 105,
ETHTOOL_LINK_MODE_200000baseDR_2_Full_BIT = 106,
ETHTOOL_LINK_MODE_200000baseSR_Full_BIT = 107,
ETHTOOL_LINK_MODE_200000baseVR_Full_BIT = 108,
ETHTOOL_LINK_MODE_400000baseCR2_Full_BIT = 109,
ETHTOOL_LINK_MODE_400000baseKR2_Full_BIT = 110,
ETHTOOL_LINK_MODE_400000baseDR2_Full_BIT = 111,
ETHTOOL_LINK_MODE_400000baseDR2_2_Full_BIT = 112,
ETHTOOL_LINK_MODE_400000baseSR2_Full_BIT = 113,
ETHTOOL_LINK_MODE_400000baseVR2_Full_BIT = 114,
ETHTOOL_LINK_MODE_800000baseCR4_Full_BIT = 115,
ETHTOOL_LINK_MODE_800000baseKR4_Full_BIT = 116,
ETHTOOL_LINK_MODE_800000baseDR4_Full_BIT = 117,
ETHTOOL_LINK_MODE_800000baseDR4_2_Full_BIT = 118,
ETHTOOL_LINK_MODE_800000baseSR4_Full_BIT = 119,
ETHTOOL_LINK_MODE_800000baseVR4_Full_BIT = 120,
/* must be last entry */
__ETHTOOL_LINK_MODE_MASK_NBITS

View File

@ -87,6 +87,7 @@ struct io_uring_sqe {
union {
__s32 splice_fd_in;
__u32 file_index;
__u32 zcrx_ifq_idx;
__u32 optlen;
struct {
__u16 addr_len;
@ -278,6 +279,7 @@ enum io_uring_op {
IORING_OP_FTRUNCATE,
IORING_OP_BIND,
IORING_OP_LISTEN,
IORING_OP_RECV_ZC,
/* this goes last, obviously */
IORING_OP_LAST,
@ -639,7 +641,8 @@ enum io_uring_register_op {
/* send MSG_RING without having a ring */
IORING_REGISTER_SEND_MSG_RING = 31,
/* 32 reserved for zc rx */
/* register a netdev hw rx queue for zerocopy */
IORING_REGISTER_ZCRX_IFQ = 32,
/* resize CQ ring */
IORING_REGISTER_RESIZE_RINGS = 33,
@ -956,6 +959,55 @@ enum io_uring_socket_op {
SOCKET_URING_OP_SETSOCKOPT,
};
/* Zero copy receive refill queue entry */
struct io_uring_zcrx_rqe {
__u64 off;
__u32 len;
__u32 __pad;
};
struct io_uring_zcrx_cqe {
__u64 off;
__u64 __pad;
};
/* The bit from which area id is encoded into offsets */
#define IORING_ZCRX_AREA_SHIFT 48
#define IORING_ZCRX_AREA_MASK (~(((__u64)1 << IORING_ZCRX_AREA_SHIFT) - 1))
struct io_uring_zcrx_offsets {
__u32 head;
__u32 tail;
__u32 rqes;
__u32 __resv2;
__u64 __resv[2];
};
struct io_uring_zcrx_area_reg {
__u64 addr;
__u64 len;
__u64 rq_area_token;
__u32 flags;
__u32 __resv1;
__u64 __resv2[2];
};
/*
* Argument for IORING_REGISTER_ZCRX_IFQ
*/
struct io_uring_zcrx_ifq_reg {
__u32 if_idx;
__u32 if_rxq;
__u32 rq_entries;
__u32 flags;
__u64 area_ptr; /* pointer to struct io_uring_zcrx_area_reg */
__u64 region_ptr; /* struct io_uring_region_desc * */
struct io_uring_zcrx_offsets offsets;
__u64 __resv[4];
};
#ifdef __cplusplus
}
#endif

View File

@ -86,6 +86,11 @@ enum {
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
enum {
__NETDEV_A_IO_URING_PROVIDER_INFO_MAX,
NETDEV_A_IO_URING_PROVIDER_INFO_MAX = (__NETDEV_A_IO_URING_PROVIDER_INFO_MAX - 1)
};
enum {
NETDEV_A_PAGE_POOL_ID = 1,
NETDEV_A_PAGE_POOL_IFINDEX,
@ -94,6 +99,7 @@ enum {
NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
NETDEV_A_PAGE_POOL_DETACH_TIME,
NETDEV_A_PAGE_POOL_DMABUF,
NETDEV_A_PAGE_POOL_IO_URING,
__NETDEV_A_PAGE_POOL_MAX,
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
@ -136,6 +142,7 @@ enum {
NETDEV_A_QUEUE_TYPE,
NETDEV_A_QUEUE_NAPI_ID,
NETDEV_A_QUEUE_DMABUF,
NETDEV_A_QUEUE_IO_URING,
__NETDEV_A_QUEUE_MAX,
NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)

10
io_uring/Kconfig Normal file
View File

@ -0,0 +1,10 @@
# SPDX-License-Identifier: GPL-2.0-only
#
# io_uring configuration
#
config IO_URING_ZCRX
def_bool y
depends on PAGE_POOL
depends on INET
depends on NET_RX_BUSY_POLL

View File

@ -14,6 +14,7 @@ obj-$(CONFIG_IO_URING) += io_uring.o opdef.o kbuf.o rsrc.o notif.o \
epoll.o statx.o timeout.o fdinfo.o \
cancel.o waitid.o register.o \
truncate.o memmap.o alloc_cache.o
obj-$(CONFIG_IO_URING_ZCRX) += zcrx.o
obj-$(CONFIG_IO_WQ) += io-wq.o
obj-$(CONFIG_FUTEX) += futex.o
obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o

View File

@ -97,6 +97,7 @@
#include "uring_cmd.h"
#include "msg_ring.h"
#include "memmap.h"
#include "zcrx.h"
#include "timeout.h"
#include "poll.h"
@ -2729,6 +2730,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
mutex_lock(&ctx->uring_lock);
io_sqe_buffers_unregister(ctx);
io_sqe_files_unregister(ctx);
io_unregister_zcrx_ifqs(ctx);
io_cqring_overflow_kill(ctx);
io_eventfd_unregister(ctx);
io_free_alloc_caches(ctx);
@ -2888,6 +2890,11 @@ static __cold void io_ring_exit_work(struct work_struct *work)
io_cqring_overflow_kill(ctx);
mutex_unlock(&ctx->uring_lock);
}
if (ctx->ifq) {
mutex_lock(&ctx->uring_lock);
io_shutdown_zcrx_ifqs(ctx);
mutex_unlock(&ctx->uring_lock);
}
if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
io_move_task_work_from_local(ctx);

View File

@ -190,6 +190,16 @@ static inline bool io_get_cqe(struct io_ring_ctx *ctx, struct io_uring_cqe **ret
return io_get_cqe_overflow(ctx, ret, false);
}
static inline bool io_defer_get_uncommited_cqe(struct io_ring_ctx *ctx,
struct io_uring_cqe **cqe_ret)
{
io_lockdep_assert_cq_locked(ctx);
ctx->cq_extra++;
ctx->submit_state.cq_flush = true;
return io_get_cqe(ctx, cqe_ret);
}
static __always_inline bool io_fill_cqe_req(struct io_ring_ctx *ctx,
struct io_kiocb *req)
{

View File

@ -271,6 +271,8 @@ static struct io_mapped_region *io_mmap_get_region(struct io_ring_ctx *ctx,
return io_pbuf_get_region(ctx, bgid);
case IORING_MAP_OFF_PARAM_REGION:
return &ctx->param_region;
case IORING_MAP_OFF_ZCRX_REGION:
return &ctx->zcrx_region;
}
return NULL;
}

View File

@ -2,6 +2,7 @@
#define IO_URING_MEMMAP_H
#define IORING_MAP_OFF_PARAM_REGION 0x20000000ULL
#define IORING_MAP_OFF_ZCRX_REGION 0x30000000ULL
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);

View File

@ -16,6 +16,7 @@
#include "net.h"
#include "notif.h"
#include "rsrc.h"
#include "zcrx.h"
#if defined(CONFIG_NET)
struct io_shutdown {
@ -88,6 +89,14 @@ struct io_sr_msg {
*/
#define MULTISHOT_MAX_RETRY 32
struct io_recvzc {
struct file *file;
unsigned msg_flags;
u16 flags;
u32 len;
struct io_zcrx_ifq *ifq;
};
int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_shutdown *shutdown = io_kiocb_to_cmd(req, struct io_shutdown);
@ -1200,6 +1209,81 @@ out_free:
return ret;
}
int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc);
unsigned ifq_idx;
if (unlikely(sqe->file_index || sqe->addr2 || sqe->addr ||
sqe->addr3))
return -EINVAL;
ifq_idx = READ_ONCE(sqe->zcrx_ifq_idx);
if (ifq_idx != 0)
return -EINVAL;
zc->ifq = req->ctx->ifq;
if (!zc->ifq)
return -EINVAL;
zc->len = READ_ONCE(sqe->len);
zc->flags = READ_ONCE(sqe->ioprio);
zc->msg_flags = READ_ONCE(sqe->msg_flags);
if (zc->msg_flags)
return -EINVAL;
if (zc->flags & ~(IORING_RECVSEND_POLL_FIRST | IORING_RECV_MULTISHOT))
return -EINVAL;
/* multishot required */
if (!(zc->flags & IORING_RECV_MULTISHOT))
return -EINVAL;
/* All data completions are posted as aux CQEs. */
req->flags |= REQ_F_APOLL_MULTISHOT;
return 0;
}
int io_recvzc(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc);
struct socket *sock;
unsigned int len;
int ret;
if (!(req->flags & REQ_F_POLLED) &&
(zc->flags & IORING_RECVSEND_POLL_FIRST))
return -EAGAIN;
sock = sock_from_file(req->file);
if (unlikely(!sock))
return -ENOTSOCK;
len = zc->len;
ret = io_zcrx_recv(req, zc->ifq, sock, zc->msg_flags | MSG_DONTWAIT,
issue_flags, &zc->len);
if (len && zc->len == 0) {
io_req_set_res(req, 0, 0);
if (issue_flags & IO_URING_F_MULTISHOT)
return IOU_STOP_MULTISHOT;
return IOU_OK;
}
if (unlikely(ret <= 0) && ret != -EAGAIN) {
if (ret == -ERESTARTSYS)
ret = -EINTR;
if (ret == IOU_REQUEUE)
return IOU_REQUEUE;
req_set_fail(req);
io_req_set_res(req, ret, 0);
if (issue_flags & IO_URING_F_MULTISHOT)
return IOU_STOP_MULTISHOT;
return IOU_OK;
}
if (issue_flags & IO_URING_F_MULTISHOT)
return IOU_ISSUE_SKIP_COMPLETE;
return -EAGAIN;
}
void io_send_zc_cleanup(struct io_kiocb *req)
{
struct io_sr_msg *zc = io_kiocb_to_cmd(req, struct io_sr_msg);

View File

@ -37,6 +37,7 @@
#include "waitid.h"
#include "futex.h"
#include "truncate.h"
#include "zcrx.h"
static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags)
{
@ -514,6 +515,18 @@ const struct io_issue_def io_issue_defs[] = {
.async_size = sizeof(struct io_async_msghdr),
#else
.prep = io_eopnotsupp_prep,
#endif
},
[IORING_OP_RECV_ZC] = {
.needs_file = 1,
.unbound_nonreg_file = 1,
.pollin = 1,
.ioprio = 1,
#if defined(CONFIG_NET)
.prep = io_recvzc_prep,
.issue = io_recvzc,
#else
.prep = io_eopnotsupp_prep,
#endif
},
};
@ -745,6 +758,9 @@ const struct io_cold_def io_cold_defs[] = {
[IORING_OP_LISTEN] = {
.name = "LISTEN",
},
[IORING_OP_RECV_ZC] = {
.name = "RECV_ZC",
},
};
const char *io_uring_get_opcode(u8 opcode)

View File

@ -30,6 +30,7 @@
#include "eventfd.h"
#include "msg_ring.h"
#include "memmap.h"
#include "zcrx.h"
#define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \
IORING_REGISTER_LAST + IORING_OP_LAST)
@ -813,6 +814,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
break;
ret = io_register_clone_buffers(ctx, arg);
break;
case IORING_REGISTER_ZCRX_IFQ:
ret = -EINVAL;
if (!arg || nr_args != 1)
break;
ret = io_register_zcrx_ifq(ctx, arg);
break;
case IORING_REGISTER_RESIZE_RINGS:
ret = -EINVAL;
if (!arg || nr_args != 1)

View File

@ -80,7 +80,7 @@ static int io_account_mem(struct io_ring_ctx *ctx, unsigned long nr_pages)
return 0;
}
static int io_buffer_validate(struct iovec *iov)
int io_buffer_validate(struct iovec *iov)
{
unsigned long tmp, acct_len = iov->iov_len + (PAGE_SIZE - 1);

View File

@ -76,6 +76,7 @@ int io_register_rsrc_update(struct io_ring_ctx *ctx, void __user *arg,
unsigned size, unsigned type);
int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg,
unsigned int size, unsigned int type);
int io_buffer_validate(struct iovec *iov);
bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
struct io_imu_folio_data *data);

960
io_uring/zcrx.c Normal file
View File

@ -0,0 +1,960 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/dma-map-ops.h>
#include <linux/mm.h>
#include <linux/nospec.h>
#include <linux/io_uring.h>
#include <linux/netdevice.h>
#include <linux/rtnetlink.h>
#include <linux/skbuff_ref.h>
#include <net/page_pool/helpers.h>
#include <net/page_pool/memory_provider.h>
#include <net/netlink.h>
#include <net/netdev_rx_queue.h>
#include <net/tcp.h>
#include <net/rps.h>
#include <trace/events/page_pool.h>
#include <uapi/linux/io_uring.h>
#include "io_uring.h"
#include "kbuf.h"
#include "memmap.h"
#include "zcrx.h"
#include "rsrc.h"
#define IO_DMA_ATTR (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
struct io_zcrx_area *area, int nr_mapped)
{
int i;
for (i = 0; i < nr_mapped; i++) {
struct net_iov *niov = &area->nia.niovs[i];
dma_addr_t dma;
dma = page_pool_get_dma_addr_netmem(net_iov_to_netmem(niov));
dma_unmap_page_attrs(ifq->dev, dma, PAGE_SIZE,
DMA_FROM_DEVICE, IO_DMA_ATTR);
net_mp_niov_set_dma_addr(niov, 0);
}
}
static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
{
if (area->is_mapped)
__io_zcrx_unmap_area(ifq, area, area->nia.num_niovs);
}
static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
{
int i;
for (i = 0; i < area->nia.num_niovs; i++) {
struct net_iov *niov = &area->nia.niovs[i];
dma_addr_t dma;
dma = dma_map_page_attrs(ifq->dev, area->pages[i], 0, PAGE_SIZE,
DMA_FROM_DEVICE, IO_DMA_ATTR);
if (dma_mapping_error(ifq->dev, dma))
break;
if (net_mp_niov_set_dma_addr(niov, dma)) {
dma_unmap_page_attrs(ifq->dev, dma, PAGE_SIZE,
DMA_FROM_DEVICE, IO_DMA_ATTR);
break;
}
}
if (i != area->nia.num_niovs) {
__io_zcrx_unmap_area(ifq, area, i);
return -EINVAL;
}
area->is_mapped = true;
return 0;
}
static void io_zcrx_sync_for_device(const struct page_pool *pool,
struct net_iov *niov)
{
#if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC)
dma_addr_t dma_addr;
if (!dma_dev_need_sync(pool->p.dev))
return;
dma_addr = page_pool_get_dma_addr_netmem(net_iov_to_netmem(niov));
__dma_sync_single_for_device(pool->p.dev, dma_addr + pool->p.offset,
PAGE_SIZE, pool->p.dma_dir);
#endif
}
#define IO_RQ_MAX_ENTRIES 32768
#define IO_SKBS_PER_CALL_LIMIT 20
struct io_zcrx_args {
struct io_kiocb *req;
struct io_zcrx_ifq *ifq;
struct socket *sock;
unsigned nr_skbs;
};
static const struct memory_provider_ops io_uring_pp_zc_ops;
static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *niov)
{
struct net_iov_area *owner = net_iov_owner(niov);
return container_of(owner, struct io_zcrx_area, nia);
}
static inline atomic_t *io_get_user_counter(struct net_iov *niov)
{
struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
return &area->user_refs[net_iov_idx(niov)];
}
static bool io_zcrx_put_niov_uref(struct net_iov *niov)
{
atomic_t *uref = io_get_user_counter(niov);
if (unlikely(!atomic_read(uref)))
return false;
atomic_dec(uref);
return true;
}
static void io_zcrx_get_niov_uref(struct net_iov *niov)
{
atomic_inc(io_get_user_counter(niov));
}
static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
{
struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
return area->pages[net_iov_idx(niov)];
}
static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
struct io_uring_zcrx_ifq_reg *reg,
struct io_uring_region_desc *rd)
{
size_t off, size;
void *ptr;
int ret;
off = sizeof(struct io_uring);
size = off + sizeof(struct io_uring_zcrx_rqe) * reg->rq_entries;
if (size > rd->size)
return -EINVAL;
ret = io_create_region_mmap_safe(ifq->ctx, &ifq->ctx->zcrx_region, rd,
IORING_MAP_OFF_ZCRX_REGION);
if (ret < 0)
return ret;
ptr = io_region_get_ptr(&ifq->ctx->zcrx_region);
ifq->rq_ring = (struct io_uring *)ptr;
ifq->rqes = (struct io_uring_zcrx_rqe *)(ptr + off);
return 0;
}
static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq)
{
io_free_region(ifq->ctx, &ifq->ctx->zcrx_region);
ifq->rq_ring = NULL;
ifq->rqes = NULL;
}
static void io_zcrx_free_area(struct io_zcrx_area *area)
{
io_zcrx_unmap_area(area->ifq, area);
kvfree(area->freelist);
kvfree(area->nia.niovs);
kvfree(area->user_refs);
if (area->pages) {
unpin_user_pages(area->pages, area->nia.num_niovs);
kvfree(area->pages);
}
kfree(area);
}
static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
struct io_zcrx_area **res,
struct io_uring_zcrx_area_reg *area_reg)
{
struct io_zcrx_area *area;
int i, ret, nr_pages;
struct iovec iov;
if (area_reg->flags || area_reg->rq_area_token)
return -EINVAL;
if (area_reg->__resv1 || area_reg->__resv2[0] || area_reg->__resv2[1])
return -EINVAL;
if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK)
return -EINVAL;
iov.iov_base = u64_to_user_ptr(area_reg->addr);
iov.iov_len = area_reg->len;
ret = io_buffer_validate(&iov);
if (ret)
return ret;
ret = -ENOMEM;
area = kzalloc(sizeof(*area), GFP_KERNEL);
if (!area)
goto err;
area->pages = io_pin_pages((unsigned long)area_reg->addr, area_reg->len,
&nr_pages);
if (IS_ERR(area->pages)) {
ret = PTR_ERR(area->pages);
area->pages = NULL;
goto err;
}
area->nia.num_niovs = nr_pages;
area->nia.niovs = kvmalloc_array(nr_pages, sizeof(area->nia.niovs[0]),
GFP_KERNEL | __GFP_ZERO);
if (!area->nia.niovs)
goto err;
area->freelist = kvmalloc_array(nr_pages, sizeof(area->freelist[0]),
GFP_KERNEL | __GFP_ZERO);
if (!area->freelist)
goto err;
for (i = 0; i < nr_pages; i++)
area->freelist[i] = i;
area->user_refs = kvmalloc_array(nr_pages, sizeof(area->user_refs[0]),
GFP_KERNEL | __GFP_ZERO);
if (!area->user_refs)
goto err;
for (i = 0; i < nr_pages; i++) {
struct net_iov *niov = &area->nia.niovs[i];
niov->owner = &area->nia;
area->freelist[i] = i;
atomic_set(&area->user_refs[i], 0);
}
area->free_count = nr_pages;
area->ifq = ifq;
/* we're only supporting one area per ifq for now */
area->area_id = 0;
area_reg->rq_area_token = (u64)area->area_id << IORING_ZCRX_AREA_SHIFT;
spin_lock_init(&area->freelist_lock);
*res = area;
return 0;
err:
if (area)
io_zcrx_free_area(area);
return ret;
}
static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx)
{
struct io_zcrx_ifq *ifq;
ifq = kzalloc(sizeof(*ifq), GFP_KERNEL);
if (!ifq)
return NULL;
ifq->if_rxq = -1;
ifq->ctx = ctx;
spin_lock_init(&ifq->lock);
spin_lock_init(&ifq->rq_lock);
return ifq;
}
static void io_zcrx_drop_netdev(struct io_zcrx_ifq *ifq)
{
spin_lock(&ifq->lock);
if (ifq->netdev) {
netdev_put(ifq->netdev, &ifq->netdev_tracker);
ifq->netdev = NULL;
}
spin_unlock(&ifq->lock);
}
static void io_close_queue(struct io_zcrx_ifq *ifq)
{
struct net_device *netdev;
netdevice_tracker netdev_tracker;
struct pp_memory_provider_params p = {
.mp_ops = &io_uring_pp_zc_ops,
.mp_priv = ifq,
};
if (ifq->if_rxq == -1)
return;
spin_lock(&ifq->lock);
netdev = ifq->netdev;
netdev_tracker = ifq->netdev_tracker;
ifq->netdev = NULL;
spin_unlock(&ifq->lock);
if (netdev) {
net_mp_close_rxq(netdev, ifq->if_rxq, &p);
netdev_put(netdev, &netdev_tracker);
}
ifq->if_rxq = -1;
}
static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq)
{
io_close_queue(ifq);
io_zcrx_drop_netdev(ifq);
if (ifq->area)
io_zcrx_free_area(ifq->area);
if (ifq->dev)
put_device(ifq->dev);
io_free_rbuf_ring(ifq);
kfree(ifq);
}
int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
struct io_uring_zcrx_ifq_reg __user *arg)
{
struct pp_memory_provider_params mp_param = {};
struct io_uring_zcrx_area_reg area;
struct io_uring_zcrx_ifq_reg reg;
struct io_uring_region_desc rd;
struct io_zcrx_ifq *ifq;
int ret;
/*
* 1. Interface queue allocation.
* 2. It can observe data destined for sockets of other tasks.
*/
if (!capable(CAP_NET_ADMIN))
return -EPERM;
/* mandatory io_uring features for zc rx */
if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN &&
ctx->flags & IORING_SETUP_CQE32))
return -EINVAL;
if (ctx->ifq)
return -EBUSY;
if (copy_from_user(&reg, arg, sizeof(reg)))
return -EFAULT;
if (copy_from_user(&rd, u64_to_user_ptr(reg.region_ptr), sizeof(rd)))
return -EFAULT;
if (memchr_inv(&reg.__resv, 0, sizeof(reg.__resv)))
return -EINVAL;
if (reg.if_rxq == -1 || !reg.rq_entries || reg.flags)
return -EINVAL;
if (reg.rq_entries > IO_RQ_MAX_ENTRIES) {
if (!(ctx->flags & IORING_SETUP_CLAMP))
return -EINVAL;
reg.rq_entries = IO_RQ_MAX_ENTRIES;
}
reg.rq_entries = roundup_pow_of_two(reg.rq_entries);
if (copy_from_user(&area, u64_to_user_ptr(reg.area_ptr), sizeof(area)))
return -EFAULT;
ifq = io_zcrx_ifq_alloc(ctx);
if (!ifq)
return -ENOMEM;
ret = io_allocate_rbuf_ring(ifq, &reg, &rd);
if (ret)
goto err;
ret = io_zcrx_create_area(ifq, &ifq->area, &area);
if (ret)
goto err;
ifq->rq_entries = reg.rq_entries;
ret = -ENODEV;
ifq->netdev = netdev_get_by_index(current->nsproxy->net_ns, reg.if_idx,
&ifq->netdev_tracker, GFP_KERNEL);
if (!ifq->netdev)
goto err;
ifq->dev = ifq->netdev->dev.parent;
ret = -EOPNOTSUPP;
if (!ifq->dev)
goto err;
get_device(ifq->dev);
ret = io_zcrx_map_area(ifq, ifq->area);
if (ret)
goto err;
mp_param.mp_ops = &io_uring_pp_zc_ops;
mp_param.mp_priv = ifq;
ret = net_mp_open_rxq(ifq->netdev, reg.if_rxq, &mp_param);
if (ret)
goto err;
ifq->if_rxq = reg.if_rxq;
reg.offsets.rqes = sizeof(struct io_uring);
reg.offsets.head = offsetof(struct io_uring, head);
reg.offsets.tail = offsetof(struct io_uring, tail);
if (copy_to_user(arg, &reg, sizeof(reg)) ||
copy_to_user(u64_to_user_ptr(reg.region_ptr), &rd, sizeof(rd)) ||
copy_to_user(u64_to_user_ptr(reg.area_ptr), &area, sizeof(area))) {
ret = -EFAULT;
goto err;
}
ctx->ifq = ifq;
return 0;
err:
io_zcrx_ifq_free(ifq);
return ret;
}
void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx)
{
struct io_zcrx_ifq *ifq = ctx->ifq;
lockdep_assert_held(&ctx->uring_lock);
if (!ifq)
return;
ctx->ifq = NULL;
io_zcrx_ifq_free(ifq);
}
static struct net_iov *__io_zcrx_get_free_niov(struct io_zcrx_area *area)
{
unsigned niov_idx;
lockdep_assert_held(&area->freelist_lock);
niov_idx = area->freelist[--area->free_count];
return &area->nia.niovs[niov_idx];
}
static void io_zcrx_return_niov_freelist(struct net_iov *niov)
{
struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
spin_lock_bh(&area->freelist_lock);
area->freelist[area->free_count++] = net_iov_idx(niov);
spin_unlock_bh(&area->freelist_lock);
}
static void io_zcrx_return_niov(struct net_iov *niov)
{
netmem_ref netmem = net_iov_to_netmem(niov);
if (!niov->pp) {
/* copy fallback allocated niovs */
io_zcrx_return_niov_freelist(niov);
return;
}
page_pool_put_unrefed_netmem(niov->pp, netmem, -1, false);
}
static void io_zcrx_scrub(struct io_zcrx_ifq *ifq)
{
struct io_zcrx_area *area = ifq->area;
int i;
if (!area)
return;
/* Reclaim back all buffers given to the user space. */
for (i = 0; i < area->nia.num_niovs; i++) {
struct net_iov *niov = &area->nia.niovs[i];
int nr;
if (!atomic_read(io_get_user_counter(niov)))
continue;
nr = atomic_xchg(io_get_user_counter(niov), 0);
if (nr && !page_pool_unref_netmem(net_iov_to_netmem(niov), nr))
io_zcrx_return_niov(niov);
}
}
void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx)
{
lockdep_assert_held(&ctx->uring_lock);
if (!ctx->ifq)
return;
io_zcrx_scrub(ctx->ifq);
io_close_queue(ctx->ifq);
}
static inline u32 io_zcrx_rqring_entries(struct io_zcrx_ifq *ifq)
{
u32 entries;
entries = smp_load_acquire(&ifq->rq_ring->tail) - ifq->cached_rq_head;
return min(entries, ifq->rq_entries);
}
static struct io_uring_zcrx_rqe *io_zcrx_get_rqe(struct io_zcrx_ifq *ifq,
unsigned mask)
{
unsigned int idx = ifq->cached_rq_head++ & mask;
return &ifq->rqes[idx];
}
static void io_zcrx_ring_refill(struct page_pool *pp,
struct io_zcrx_ifq *ifq)
{
unsigned int mask = ifq->rq_entries - 1;
unsigned int entries;
netmem_ref netmem;
spin_lock_bh(&ifq->rq_lock);
entries = io_zcrx_rqring_entries(ifq);
entries = min_t(unsigned, entries, PP_ALLOC_CACHE_REFILL - pp->alloc.count);
if (unlikely(!entries)) {
spin_unlock_bh(&ifq->rq_lock);
return;
}
do {
struct io_uring_zcrx_rqe *rqe = io_zcrx_get_rqe(ifq, mask);
struct io_zcrx_area *area;
struct net_iov *niov;
unsigned niov_idx, area_idx;
area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT;
niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) >> PAGE_SHIFT;
if (unlikely(rqe->__pad || area_idx))
continue;
area = ifq->area;
if (unlikely(niov_idx >= area->nia.num_niovs))
continue;
niov_idx = array_index_nospec(niov_idx, area->nia.num_niovs);
niov = &area->nia.niovs[niov_idx];
if (!io_zcrx_put_niov_uref(niov))
continue;
netmem = net_iov_to_netmem(niov);
if (page_pool_unref_netmem(netmem, 1) != 0)
continue;
if (unlikely(niov->pp != pp)) {
io_zcrx_return_niov(niov);
continue;
}
io_zcrx_sync_for_device(pp, niov);
net_mp_netmem_place_in_cache(pp, netmem);
} while (--entries);
smp_store_release(&ifq->rq_ring->head, ifq->cached_rq_head);
spin_unlock_bh(&ifq->rq_lock);
}
static void io_zcrx_refill_slow(struct page_pool *pp, struct io_zcrx_ifq *ifq)
{
struct io_zcrx_area *area = ifq->area;
spin_lock_bh(&area->freelist_lock);
while (area->free_count && pp->alloc.count < PP_ALLOC_CACHE_REFILL) {
struct net_iov *niov = __io_zcrx_get_free_niov(area);
netmem_ref netmem = net_iov_to_netmem(niov);
net_mp_niov_set_page_pool(pp, niov);
io_zcrx_sync_for_device(pp, niov);
net_mp_netmem_place_in_cache(pp, netmem);
}
spin_unlock_bh(&area->freelist_lock);
}
static netmem_ref io_pp_zc_alloc_netmems(struct page_pool *pp, gfp_t gfp)
{
struct io_zcrx_ifq *ifq = pp->mp_priv;
/* pp should already be ensuring that */
if (unlikely(pp->alloc.count))
goto out_return;
io_zcrx_ring_refill(pp, ifq);
if (likely(pp->alloc.count))
goto out_return;
io_zcrx_refill_slow(pp, ifq);
if (!pp->alloc.count)
return 0;
out_return:
return pp->alloc.cache[--pp->alloc.count];
}
static bool io_pp_zc_release_netmem(struct page_pool *pp, netmem_ref netmem)
{
struct net_iov *niov;
if (WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
return false;
niov = netmem_to_net_iov(netmem);
net_mp_niov_clear_page_pool(niov);
io_zcrx_return_niov_freelist(niov);
return false;
}
static int io_pp_zc_init(struct page_pool *pp)
{
struct io_zcrx_ifq *ifq = pp->mp_priv;
if (WARN_ON_ONCE(!ifq))
return -EINVAL;
if (WARN_ON_ONCE(ifq->dev != pp->p.dev))
return -EINVAL;
if (WARN_ON_ONCE(!pp->dma_map))
return -EOPNOTSUPP;
if (pp->p.order != 0)
return -EOPNOTSUPP;
if (pp->p.dma_dir != DMA_FROM_DEVICE)
return -EOPNOTSUPP;
percpu_ref_get(&ifq->ctx->refs);
return 0;
}
static void io_pp_zc_destroy(struct page_pool *pp)
{
struct io_zcrx_ifq *ifq = pp->mp_priv;
struct io_zcrx_area *area = ifq->area;
if (WARN_ON_ONCE(area->free_count != area->nia.num_niovs))
return;
percpu_ref_put(&ifq->ctx->refs);
}
static int io_pp_nl_fill(void *mp_priv, struct sk_buff *rsp,
struct netdev_rx_queue *rxq)
{
struct nlattr *nest;
int type;
type = rxq ? NETDEV_A_QUEUE_IO_URING : NETDEV_A_PAGE_POOL_IO_URING;
nest = nla_nest_start(rsp, type);
if (!nest)
return -EMSGSIZE;
nla_nest_end(rsp, nest);
return 0;
}
static void io_pp_uninstall(void *mp_priv, struct netdev_rx_queue *rxq)
{
struct pp_memory_provider_params *p = &rxq->mp_params;
struct io_zcrx_ifq *ifq = mp_priv;
io_zcrx_drop_netdev(ifq);
p->mp_ops = NULL;
p->mp_priv = NULL;
}
static const struct memory_provider_ops io_uring_pp_zc_ops = {
.alloc_netmems = io_pp_zc_alloc_netmems,
.release_netmem = io_pp_zc_release_netmem,
.init = io_pp_zc_init,
.destroy = io_pp_zc_destroy,
.nl_fill = io_pp_nl_fill,
.uninstall = io_pp_uninstall,
};
static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov,
struct io_zcrx_ifq *ifq, int off, int len)
{
struct io_uring_zcrx_cqe *rcqe;
struct io_zcrx_area *area;
struct io_uring_cqe *cqe;
u64 offset;
if (!io_defer_get_uncommited_cqe(req->ctx, &cqe))
return false;
cqe->user_data = req->cqe.user_data;
cqe->res = len;
cqe->flags = IORING_CQE_F_MORE;
area = io_zcrx_iov_to_area(niov);
offset = off + (net_iov_idx(niov) << PAGE_SHIFT);
rcqe = (struct io_uring_zcrx_cqe *)(cqe + 1);
rcqe->off = offset + ((u64)area->area_id << IORING_ZCRX_AREA_SHIFT);
rcqe->__pad = 0;
return true;
}
static struct net_iov *io_zcrx_alloc_fallback(struct io_zcrx_area *area)
{
struct net_iov *niov = NULL;
spin_lock_bh(&area->freelist_lock);
if (area->free_count)
niov = __io_zcrx_get_free_niov(area);
spin_unlock_bh(&area->freelist_lock);
if (niov)
page_pool_fragment_netmem(net_iov_to_netmem(niov), 1);
return niov;
}
static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
void *src_base, struct page *src_page,
unsigned int src_offset, size_t len)
{
struct io_zcrx_area *area = ifq->area;
size_t copied = 0;
int ret = 0;
while (len) {
size_t copy_size = min_t(size_t, PAGE_SIZE, len);
const int dst_off = 0;
struct net_iov *niov;
struct page *dst_page;
void *dst_addr;
niov = io_zcrx_alloc_fallback(area);
if (!niov) {
ret = -ENOMEM;
break;
}
dst_page = io_zcrx_iov_page(niov);
dst_addr = kmap_local_page(dst_page);
if (src_page)
src_base = kmap_local_page(src_page);
memcpy(dst_addr, src_base + src_offset, copy_size);
if (src_page)
kunmap_local(src_base);
kunmap_local(dst_addr);
if (!io_zcrx_queue_cqe(req, niov, ifq, dst_off, copy_size)) {
io_zcrx_return_niov(niov);
ret = -ENOSPC;
break;
}
io_zcrx_get_niov_uref(niov);
src_offset += copy_size;
len -= copy_size;
copied += copy_size;
}
return copied ? copied : ret;
}
static int io_zcrx_copy_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
const skb_frag_t *frag, int off, int len)
{
struct page *page = skb_frag_page(frag);
u32 p_off, p_len, t, copied = 0;
int ret = 0;
off += skb_frag_off(frag);
skb_frag_foreach_page(frag, off, len,
page, p_off, p_len, t) {
ret = io_zcrx_copy_chunk(req, ifq, NULL, page, p_off, p_len);
if (ret < 0)
return copied ? copied : ret;
copied += ret;
}
return copied;
}
static int io_zcrx_recv_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
const skb_frag_t *frag, int off, int len)
{
struct net_iov *niov;
if (unlikely(!skb_frag_is_net_iov(frag)))
return io_zcrx_copy_frag(req, ifq, frag, off, len);
niov = netmem_to_net_iov(frag->netmem);
if (niov->pp->mp_ops != &io_uring_pp_zc_ops ||
niov->pp->mp_priv != ifq)
return -EFAULT;
if (!io_zcrx_queue_cqe(req, niov, ifq, off + skb_frag_off(frag), len))
return -ENOSPC;
/*
* Prevent it from being recycled while user is accessing it.
* It has to be done before grabbing a user reference.
*/
page_pool_ref_netmem(net_iov_to_netmem(niov));
io_zcrx_get_niov_uref(niov);
return len;
}
static int
io_zcrx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb,
unsigned int offset, size_t len)
{
struct io_zcrx_args *args = desc->arg.data;
struct io_zcrx_ifq *ifq = args->ifq;
struct io_kiocb *req = args->req;
struct sk_buff *frag_iter;
unsigned start, start_off = offset;
int i, copy, end, off;
int ret = 0;
len = min_t(size_t, len, desc->count);
if (unlikely(args->nr_skbs++ > IO_SKBS_PER_CALL_LIMIT))
return -EAGAIN;
if (unlikely(offset < skb_headlen(skb))) {
ssize_t copied;
size_t to_copy;
to_copy = min_t(size_t, skb_headlen(skb) - offset, len);
copied = io_zcrx_copy_chunk(req, ifq, skb->data, NULL,
offset, to_copy);
if (copied < 0) {
ret = copied;
goto out;
}
offset += copied;
len -= copied;
if (!len)
goto out;
if (offset != skb_headlen(skb))
goto out;
}
start = skb_headlen(skb);
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
const skb_frag_t *frag;
if (WARN_ON(start > offset + len))
return -EFAULT;
frag = &skb_shinfo(skb)->frags[i];
end = start + skb_frag_size(frag);
if (offset < end) {
copy = end - offset;
if (copy > len)
copy = len;
off = offset - start;
ret = io_zcrx_recv_frag(req, ifq, frag, off, copy);
if (ret < 0)
goto out;
offset += ret;
len -= ret;
if (len == 0 || ret != copy)
goto out;
}
start = end;
}
skb_walk_frags(skb, frag_iter) {
if (WARN_ON(start > offset + len))
return -EFAULT;
end = start + frag_iter->len;
if (offset < end) {
copy = end - offset;
if (copy > len)
copy = len;
off = offset - start;
ret = io_zcrx_recv_skb(desc, frag_iter, off, copy);
if (ret < 0)
goto out;
offset += ret;
len -= ret;
if (len == 0 || ret != copy)
goto out;
}
start = end;
}
out:
if (offset == start_off)
return ret;
desc->count -= (offset - start_off);
return offset - start_off;
}
static int io_zcrx_tcp_recvmsg(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
struct sock *sk, int flags,
unsigned issue_flags, unsigned int *outlen)
{
unsigned int len = *outlen;
struct io_zcrx_args args = {
.req = req,
.ifq = ifq,
.sock = sk->sk_socket,
};
read_descriptor_t rd_desc = {
.count = len ? len : UINT_MAX,
.arg.data = &args,
};
int ret;
lock_sock(sk);
ret = tcp_read_sock(sk, &rd_desc, io_zcrx_recv_skb);
if (len && ret > 0)
*outlen = len - ret;
if (ret <= 0) {
if (ret < 0 || sock_flag(sk, SOCK_DONE))
goto out;
if (sk->sk_err)
ret = sock_error(sk);
else if (sk->sk_shutdown & RCV_SHUTDOWN)
goto out;
else if (sk->sk_state == TCP_CLOSE)
ret = -ENOTCONN;
else
ret = -EAGAIN;
} else if (unlikely(args.nr_skbs > IO_SKBS_PER_CALL_LIMIT) &&
(issue_flags & IO_URING_F_MULTISHOT)) {
ret = IOU_REQUEUE;
} else if (sock_flag(sk, SOCK_DONE)) {
/* Make it to retry until it finally gets 0. */
if (issue_flags & IO_URING_F_MULTISHOT)
ret = IOU_REQUEUE;
else
ret = -EAGAIN;
}
out:
release_sock(sk);
return ret;
}
int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
struct socket *sock, unsigned int flags,
unsigned issue_flags, unsigned int *len)
{
struct sock *sk = sock->sk;
const struct proto *prot = READ_ONCE(sk->sk_prot);
if (prot->recvmsg != tcp_recvmsg)
return -EPROTONOSUPPORT;
sock_rps_record_flow(sk);
return io_zcrx_tcp_recvmsg(req, ifq, sk, flags, issue_flags, len);
}

73
io_uring/zcrx.h Normal file
View File

@ -0,0 +1,73 @@
// SPDX-License-Identifier: GPL-2.0
#ifndef IOU_ZC_RX_H
#define IOU_ZC_RX_H
#include <linux/io_uring_types.h>
#include <linux/socket.h>
#include <net/page_pool/types.h>
#include <net/net_trackers.h>
struct io_zcrx_area {
struct net_iov_area nia;
struct io_zcrx_ifq *ifq;
atomic_t *user_refs;
bool is_mapped;
u16 area_id;
struct page **pages;
/* freelist */
spinlock_t freelist_lock ____cacheline_aligned_in_smp;
u32 free_count;
u32 *freelist;
};
struct io_zcrx_ifq {
struct io_ring_ctx *ctx;
struct io_zcrx_area *area;
struct io_uring *rq_ring;
struct io_uring_zcrx_rqe *rqes;
u32 rq_entries;
u32 cached_rq_head;
spinlock_t rq_lock;
u32 if_rxq;
struct device *dev;
struct net_device *netdev;
netdevice_tracker netdev_tracker;
spinlock_t lock;
};
#if defined(CONFIG_IO_URING_ZCRX)
int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
struct io_uring_zcrx_ifq_reg __user *arg);
void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx);
void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx);
int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
struct socket *sock, unsigned int flags,
unsigned issue_flags, unsigned int *len);
#else
static inline int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
struct io_uring_zcrx_ifq_reg __user *arg)
{
return -EOPNOTSUPP;
}
static inline void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx)
{
}
static inline void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx)
{
}
static inline int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
struct socket *sock, unsigned int flags,
unsigned issue_flags, unsigned int *len)
{
return -EOPNOTSUPP;
}
#endif
int io_recvzc(struct io_kiocb *req, unsigned int issue_flags);
int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
#endif

View File

@ -1040,7 +1040,7 @@ static int br_mdb_add_group(const struct br_mdb_config *cfg,
/* host join */
if (!port) {
if (mp->host_joined) {
if (mp->host_joined && !(cfg->nlflags & NLM_F_REPLACE)) {
NL_SET_ERR_MSG_MOD(extack, "Group is already joined by host");
return -EEXIST;
}

View File

@ -159,6 +159,7 @@
#include <net/netdev_rx_queue.h>
#include <net/page_pool/types.h>
#include <net/page_pool/helpers.h>
#include <net/page_pool/memory_provider.h>
#include <net/rps.h>
#include <linux/phy_link_topology.h>
@ -6184,16 +6185,18 @@ EXPORT_SYMBOL(netif_receive_skb_list);
static void flush_backlog(struct work_struct *work)
{
struct sk_buff *skb, *tmp;
struct sk_buff_head list;
struct softnet_data *sd;
__skb_queue_head_init(&list);
local_bh_disable();
sd = this_cpu_ptr(&softnet_data);
backlog_lock_irq_disable(sd);
skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) {
if (skb->dev->reg_state == NETREG_UNREGISTERING) {
if (READ_ONCE(skb->dev->reg_state) == NETREG_UNREGISTERING) {
__skb_unlink(skb, &sd->input_pkt_queue);
dev_kfree_skb_irq(skb);
__skb_queue_tail(&list, skb);
rps_input_queue_head_incr(sd);
}
}
@ -6201,14 +6204,16 @@ static void flush_backlog(struct work_struct *work)
local_lock_nested_bh(&softnet_data.process_queue_bh_lock);
skb_queue_walk_safe(&sd->process_queue, skb, tmp) {
if (skb->dev->reg_state == NETREG_UNREGISTERING) {
if (READ_ONCE(skb->dev->reg_state) == NETREG_UNREGISTERING) {
__skb_unlink(skb, &sd->process_queue);
kfree_skb(skb);
__skb_queue_tail(&list, skb);
rps_input_queue_head_incr(sd);
}
}
local_unlock_nested_bh(&softnet_data.process_queue_bh_lock);
local_bh_enable();
__skb_queue_purge_reason(&list, SKB_DROP_REASON_DEV_READY);
}
static bool flush_required(int cpu)
@ -7153,6 +7158,9 @@ void __netif_napi_del_locked(struct napi_struct *napi)
if (!test_and_clear_bit(NAPI_STATE_LISTED, &napi->state))
return;
/* Make sure NAPI is disabled (or was never enabled). */
WARN_ON(!test_bit(NAPI_STATE_SCHED, &napi->state));
if (napi->config) {
napi->index = -1;
napi->config = NULL;
@ -11820,6 +11828,19 @@ void unregister_netdevice_queue(struct net_device *dev, struct list_head *head)
}
EXPORT_SYMBOL(unregister_netdevice_queue);
static void dev_memory_provider_uninstall(struct net_device *dev)
{
unsigned int i;
for (i = 0; i < dev->real_num_rx_queues; i++) {
struct netdev_rx_queue *rxq = &dev->_rx[i];
struct pp_memory_provider_params *p = &rxq->mp_params;
if (p->mp_ops && p->mp_ops->uninstall)
p->mp_ops->uninstall(rxq->mp_params.mp_priv, rxq);
}
}
void unregister_netdevice_many_notify(struct list_head *head,
u32 portid, const struct nlmsghdr *nlh)
{
@ -11874,7 +11895,7 @@ void unregister_netdevice_many_notify(struct list_head *head,
dev_tcx_uninstall(dev);
dev_xdp_uninstall(dev);
bpf_dev_bound_netdev_unregister(dev);
dev_dmabuf_uninstall(dev);
dev_memory_provider_uninstall(dev);
netdev_offload_xstats_disable_all(dev);

View File

@ -16,6 +16,7 @@
#include <net/netdev_queues.h>
#include <net/netdev_rx_queue.h>
#include <net/page_pool/helpers.h>
#include <net/page_pool/memory_provider.h>
#include <trace/events/page_pool.h>
#include "devmem.h"
@ -27,20 +28,28 @@
/* Protected by rtnl_lock() */
static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1);
static const struct memory_provider_ops dmabuf_devmem_ops;
bool net_is_devmem_iov(struct net_iov *niov)
{
return niov->pp->mp_ops == &dmabuf_devmem_ops;
}
static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
struct gen_pool_chunk *chunk,
void *not_used)
{
struct dmabuf_genpool_chunk_owner *owner = chunk->owner;
kvfree(owner->niovs);
kvfree(owner->area.niovs);
kfree(owner);
}
static dma_addr_t net_devmem_get_dma_addr(const struct net_iov *niov)
{
struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov);
struct dmabuf_genpool_chunk_owner *owner;
owner = net_devmem_iov_to_chunk_owner(niov);
return owner->base_dma_addr +
((dma_addr_t)net_iov_idx(niov) << PAGE_SHIFT);
}
@ -83,7 +92,7 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
offset = dma_addr - owner->base_dma_addr;
index = offset / PAGE_SIZE;
niov = &owner->niovs[index];
niov = &owner->area.niovs[index];
niov->pp_magic = 0;
niov->pp = NULL;
@ -94,7 +103,7 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
void net_devmem_free_dmabuf(struct net_iov *niov)
{
struct net_devmem_dmabuf_binding *binding = net_iov_binding(niov);
struct net_devmem_dmabuf_binding *binding = net_devmem_iov_binding(niov);
unsigned long dma_addr = net_devmem_get_dma_addr(niov);
if (WARN_ON(!gen_pool_has_addr(binding->chunk_pool, dma_addr,
@ -117,6 +126,7 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
WARN_ON(rxq->mp_params.mp_priv != binding);
rxq->mp_params.mp_priv = NULL;
rxq->mp_params.mp_ops = NULL;
rxq_idx = get_netdev_rx_queue_index(rxq);
@ -152,7 +162,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
}
rxq = __netif_get_rx_queue(dev, rxq_idx);
if (rxq->mp_params.mp_priv) {
if (rxq->mp_params.mp_ops) {
NL_SET_ERR_MSG(extack, "designated queue already memory provider bound");
return -EEXIST;
}
@ -170,6 +180,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
return err;
rxq->mp_params.mp_priv = binding;
rxq->mp_params.mp_ops = &dmabuf_devmem_ops;
err = netdev_rx_queue_restart(dev, rxq_idx);
if (err)
@ -179,6 +190,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
err_xa_erase:
rxq->mp_params.mp_priv = NULL;
rxq->mp_params.mp_ops = NULL;
xa_erase(&binding->bound_rxqs, xa_idx);
return err;
@ -261,9 +273,9 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
goto err_free_chunks;
}
owner->base_virtual = virtual;
owner->area.base_virtual = virtual;
owner->base_dma_addr = dma_addr;
owner->num_niovs = len / PAGE_SIZE;
owner->area.num_niovs = len / PAGE_SIZE;
owner->binding = binding;
err = gen_pool_add_owner(binding->chunk_pool, dma_addr,
@ -275,17 +287,17 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
goto err_free_chunks;
}
owner->niovs = kvmalloc_array(owner->num_niovs,
sizeof(*owner->niovs),
GFP_KERNEL);
if (!owner->niovs) {
owner->area.niovs = kvmalloc_array(owner->area.num_niovs,
sizeof(*owner->area.niovs),
GFP_KERNEL);
if (!owner->area.niovs) {
err = -ENOMEM;
goto err_free_chunks;
}
for (i = 0; i < owner->num_niovs; i++) {
niov = &owner->niovs[i];
niov->owner = owner;
for (i = 0; i < owner->area.num_niovs; i++) {
niov = &owner->area.niovs[i];
niov->owner = &owner->area;
page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov),
net_devmem_get_dma_addr(niov));
}
@ -313,26 +325,6 @@ err_put_dmabuf:
return ERR_PTR(err);
}
void dev_dmabuf_uninstall(struct net_device *dev)
{
struct net_devmem_dmabuf_binding *binding;
struct netdev_rx_queue *rxq;
unsigned long xa_idx;
unsigned int i;
for (i = 0; i < dev->real_num_rx_queues; i++) {
binding = dev->_rx[i].mp_params.mp_priv;
if (!binding)
continue;
xa_for_each(&binding->bound_rxqs, xa_idx, rxq)
if (rxq == &dev->_rx[i]) {
xa_erase(&binding->bound_rxqs, xa_idx);
break;
}
}
}
/*** "Dmabuf devmem memory provider" ***/
int mp_dmabuf_devmem_init(struct page_pool *pool)
@ -398,3 +390,36 @@ bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem)
/* We don't want the page pool put_page()ing our net_iovs. */
return false;
}
static int mp_dmabuf_devmem_nl_fill(void *mp_priv, struct sk_buff *rsp,
struct netdev_rx_queue *rxq)
{
const struct net_devmem_dmabuf_binding *binding = mp_priv;
int type = rxq ? NETDEV_A_QUEUE_DMABUF : NETDEV_A_PAGE_POOL_DMABUF;
return nla_put_u32(rsp, type, binding->id);
}
static void mp_dmabuf_devmem_uninstall(void *mp_priv,
struct netdev_rx_queue *rxq)
{
struct net_devmem_dmabuf_binding *binding = mp_priv;
struct netdev_rx_queue *bound_rxq;
unsigned long xa_idx;
xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
if (bound_rxq == rxq) {
xa_erase(&binding->bound_rxqs, xa_idx);
break;
}
}
}
static const struct memory_provider_ops dmabuf_devmem_ops = {
.init = mp_dmabuf_devmem_init,
.destroy = mp_dmabuf_devmem_destroy,
.alloc_netmems = mp_dmabuf_devmem_alloc_netmems,
.release_netmem = mp_dmabuf_devmem_release_page,
.nl_fill = mp_dmabuf_devmem_nl_fill,
.uninstall = mp_dmabuf_devmem_uninstall,
};

View File

@ -10,6 +10,8 @@
#ifndef _NET_DEVMEM_H
#define _NET_DEVMEM_H
#include <net/netmem.h>
struct netlink_ext_ack;
struct net_devmem_dmabuf_binding {
@ -51,17 +53,11 @@ struct net_devmem_dmabuf_binding {
* allocations from this chunk.
*/
struct dmabuf_genpool_chunk_owner {
/* Offset into the dma-buf where this chunk starts. */
unsigned long base_virtual;
struct net_iov_area area;
struct net_devmem_dmabuf_binding *binding;
/* dma_addr of the start of the chunk. */
dma_addr_t base_dma_addr;
/* Array of net_iovs for this chunk. */
struct net_iov *niovs;
size_t num_niovs;
struct net_devmem_dmabuf_binding *binding;
};
void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding);
@ -72,38 +68,34 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding);
int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
struct net_devmem_dmabuf_binding *binding,
struct netlink_ext_ack *extack);
void dev_dmabuf_uninstall(struct net_device *dev);
static inline struct dmabuf_genpool_chunk_owner *
net_iov_owner(const struct net_iov *niov)
net_devmem_iov_to_chunk_owner(const struct net_iov *niov)
{
return niov->owner;
}
struct net_iov_area *owner = net_iov_owner(niov);
static inline unsigned int net_iov_idx(const struct net_iov *niov)
{
return niov - net_iov_owner(niov)->niovs;
return container_of(owner, struct dmabuf_genpool_chunk_owner, area);
}
static inline struct net_devmem_dmabuf_binding *
net_iov_binding(const struct net_iov *niov)
net_devmem_iov_binding(const struct net_iov *niov)
{
return net_iov_owner(niov)->binding;
return net_devmem_iov_to_chunk_owner(niov)->binding;
}
static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
{
return net_devmem_iov_binding(niov)->id;
}
static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
{
struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov);
struct net_iov_area *owner = net_iov_owner(niov);
return owner->base_virtual +
((unsigned long)net_iov_idx(niov) << PAGE_SHIFT);
}
static inline u32 net_iov_binding_id(const struct net_iov *niov)
{
return net_iov_owner(niov)->binding->id;
}
static inline void
net_devmem_dmabuf_binding_get(struct net_devmem_dmabuf_binding *binding)
{
@ -123,6 +115,8 @@ struct net_iov *
net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding);
void net_devmem_free_dmabuf(struct net_iov *ppiov);
bool net_is_devmem_iov(struct net_iov *niov);
#else
struct net_devmem_dmabuf_binding;
@ -152,10 +146,6 @@ net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
return -EOPNOTSUPP;
}
static inline void dev_dmabuf_uninstall(struct net_device *dev)
{
}
static inline struct net_iov *
net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
{
@ -171,10 +161,15 @@ static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
return 0;
}
static inline u32 net_iov_binding_id(const struct net_iov *niov)
static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
{
return 0;
}
static inline bool net_is_devmem_iov(struct net_iov *niov)
{
return false;
}
#endif
#endif /* _NET_DEVMEM_H */

View File

@ -832,12 +832,10 @@ static int pneigh_ifdown_and_unlock(struct neigh_table *tbl,
return -ENOENT;
}
static void neigh_parms_destroy(struct neigh_parms *parms);
static inline void neigh_parms_put(struct neigh_parms *parms)
{
if (refcount_dec_and_test(&parms->refcnt))
neigh_parms_destroy(parms);
kfree(parms);
}
/*
@ -1713,11 +1711,6 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
}
EXPORT_SYMBOL(neigh_parms_release);
static void neigh_parms_destroy(struct neigh_parms *parms)
{
kfree(parms);
}
static struct lock_class_key neigh_table_proxy_queue_class;
static struct neigh_table __rcu *neigh_tables[NEIGH_NR_TABLES] __read_mostly;

View File

@ -42,6 +42,87 @@ static inline int dev_isalive(const struct net_device *dev)
return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED;
}
/* There is a possible ABBA deadlock between rtnl_lock and kernfs_node->active,
* when unregistering a net device and accessing associated sysfs files. The
* potential deadlock is as follow:
*
* CPU 0 CPU 1
*
* rtnl_lock vfs_read
* unregister_netdevice_many kernfs_seq_start
* device_del / kobject_put kernfs_get_active (kn->active++)
* kernfs_drain sysfs_kf_seq_show
* wait_event( rtnl_lock
* kn->active == KN_DEACTIVATED_BIAS) -> waits on CPU 0 to release
* -> waits on CPU 1 to decrease kn->active the rtnl lock.
*
* The historical fix was to use rtnl_trylock with restart_syscall to bail out
* of sysfs operations when the lock couldn't be taken. This fixed the above
* issue as it allowed CPU 1 to bail out of the ABBA situation.
*
* But it came with performances issues, as syscalls are being restarted in
* loops when there was contention on the rtnl lock, with huge slow downs in
* specific scenarios (e.g. lots of virtual interfaces created and userspace
* daemons querying their attributes).
*
* The idea below is to bail out of the active kernfs_node protection
* (kn->active) while trying to take the rtnl lock.
*
* This replaces rtnl_lock() and still has to be used with rtnl_unlock(). The
* net device is guaranteed to be alive if this returns successfully.
*/
static int sysfs_rtnl_lock(struct kobject *kobj, struct attribute *attr,
struct net_device *ndev)
{
struct kernfs_node *kn;
int ret = 0;
/* First, we hold a reference to the net device as the unregistration
* path might run in parallel. This will ensure the net device and the
* associated sysfs objects won't be freed while we try to take the rtnl
* lock.
*/
dev_hold(ndev);
/* sysfs_break_active_protection was introduced to allow self-removal of
* devices and their associated sysfs files by bailing out of the
* sysfs/kernfs protection. We do this here to allow the unregistration
* path to complete in parallel. The following takes a reference on the
* kobject and the kernfs_node being accessed.
*
* This works because we hold a reference onto the net device and the
* unregistration path will wait for us eventually in netdev_run_todo
* (outside an rtnl lock section).
*/
kn = sysfs_break_active_protection(kobj, attr);
/* We can now try to take the rtnl lock. This can't deadlock us as the
* unregistration path is able to drain sysfs files (kernfs_node) thanks
* to the above dance.
*/
if (rtnl_lock_interruptible()) {
ret = -ERESTARTSYS;
goto unbreak;
}
/* Check dismantle on the device hasn't started, otherwise deny the
* operation.
*/
if (!dev_isalive(ndev)) {
rtnl_unlock();
ret = -ENODEV;
goto unbreak;
}
/* We are now sure the device dismantle hasn't started nor that it can
* start before we exit the locking section as we hold the rtnl lock.
* There's no need to keep unbreaking the sysfs protection nor to hold
* a net device reference from that point; that was only needed to take
* the rtnl lock.
*/
unbreak:
sysfs_unbreak_active_protection(kn);
dev_put(ndev);
return ret;
}
/* use same locking rules as GIF* ioctl's */
static ssize_t netdev_show(const struct device *dev,
struct device_attribute *attr, char *buf,
@ -95,14 +176,14 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
if (ret)
goto err;
if (!rtnl_trylock())
return restart_syscall();
ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
if (ret)
goto err;
ret = (*set)(netdev, new);
if (ret == 0)
ret = len;
if (dev_isalive(netdev)) {
ret = (*set)(netdev, new);
if (ret == 0)
ret = len;
}
rtnl_unlock();
err:
return ret;
@ -220,7 +301,7 @@ static ssize_t carrier_store(struct device *dev, struct device_attribute *attr,
struct net_device *netdev = to_net_dev(dev);
/* The check is also done in change_carrier; this helps returning early
* without hitting the trylock/restart in netdev_store.
* without hitting the locking section in netdev_store.
*/
if (!netdev->netdev_ops->ndo_change_carrier)
return -EOPNOTSUPP;
@ -234,8 +315,9 @@ static ssize_t carrier_show(struct device *dev,
struct net_device *netdev = to_net_dev(dev);
int ret = -EINVAL;
if (!rtnl_trylock())
return restart_syscall();
ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
if (ret)
return ret;
if (netif_running(netdev)) {
/* Synchronize carrier state with link watch,
@ -245,8 +327,8 @@ static ssize_t carrier_show(struct device *dev,
ret = sysfs_emit(buf, fmt_dec, !!netif_carrier_ok(netdev));
}
rtnl_unlock();
rtnl_unlock();
return ret;
}
static DEVICE_ATTR_RW(carrier);
@ -258,13 +340,14 @@ static ssize_t speed_show(struct device *dev,
int ret = -EINVAL;
/* The check is also done in __ethtool_get_link_ksettings; this helps
* returning early without hitting the trylock/restart below.
* returning early without hitting the locking section below.
*/
if (!netdev->ethtool_ops->get_link_ksettings)
return ret;
if (!rtnl_trylock())
return restart_syscall();
ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
if (ret)
return ret;
if (netif_running(netdev)) {
struct ethtool_link_ksettings cmd;
@ -284,13 +367,14 @@ static ssize_t duplex_show(struct device *dev,
int ret = -EINVAL;
/* The check is also done in __ethtool_get_link_ksettings; this helps
* returning early without hitting the trylock/restart below.
* returning early without hitting the locking section below.
*/
if (!netdev->ethtool_ops->get_link_ksettings)
return ret;
if (!rtnl_trylock())
return restart_syscall();
ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
if (ret)
return ret;
if (netif_running(netdev)) {
struct ethtool_link_ksettings cmd;
@ -490,16 +574,15 @@ static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
if (len > 0 && buf[len - 1] == '\n')
--count;
if (!rtnl_trylock())
return restart_syscall();
ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
if (ret)
return ret;
if (dev_isalive(netdev)) {
ret = dev_set_alias(netdev, buf, count);
if (ret < 0)
goto err;
ret = len;
netdev_state_change(netdev);
}
ret = dev_set_alias(netdev, buf, count);
if (ret < 0)
goto err;
ret = len;
netdev_state_change(netdev);
err:
rtnl_unlock();
@ -551,24 +634,23 @@ static ssize_t phys_port_id_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct net_device *netdev = to_net_dev(dev);
struct netdev_phys_item_id ppid;
ssize_t ret = -EINVAL;
/* The check is also done in dev_get_phys_port_id; this helps returning
* early without hitting the trylock/restart below.
* early without hitting the locking section below.
*/
if (!netdev->netdev_ops->ndo_get_phys_port_id)
return -EOPNOTSUPP;
if (!rtnl_trylock())
return restart_syscall();
ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
if (ret)
return ret;
if (dev_isalive(netdev)) {
struct netdev_phys_item_id ppid;
ret = dev_get_phys_port_id(netdev, &ppid);
if (!ret)
ret = sysfs_emit(buf, "%*phN\n", ppid.id_len, ppid.id);
ret = dev_get_phys_port_id(netdev, &ppid);
if (!ret)
ret = sysfs_emit(buf, "%*phN\n", ppid.id_len, ppid.id);
}
rtnl_unlock();
return ret;
@ -580,24 +662,23 @@ static ssize_t phys_port_name_show(struct device *dev,
{
struct net_device *netdev = to_net_dev(dev);
ssize_t ret = -EINVAL;
char name[IFNAMSIZ];
/* The checks are also done in dev_get_phys_port_name; this helps
* returning early without hitting the trylock/restart below.
* returning early without hitting the locking section below.
*/
if (!netdev->netdev_ops->ndo_get_phys_port_name &&
!netdev->devlink_port)
return -EOPNOTSUPP;
if (!rtnl_trylock())
return restart_syscall();
ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
if (ret)
return ret;
if (dev_isalive(netdev)) {
char name[IFNAMSIZ];
ret = dev_get_phys_port_name(netdev, name, sizeof(name));
if (!ret)
ret = sysfs_emit(buf, "%s\n", name);
ret = dev_get_phys_port_name(netdev, name, sizeof(name));
if (!ret)
ret = sysfs_emit(buf, "%s\n", name);
}
rtnl_unlock();
return ret;
@ -608,26 +689,25 @@ static ssize_t phys_switch_id_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct net_device *netdev = to_net_dev(dev);
struct netdev_phys_item_id ppid = { };
ssize_t ret = -EINVAL;
/* The checks are also done in dev_get_phys_port_name; this helps
* returning early without hitting the trylock/restart below. This works
* returning early without hitting the locking section below. This works
* because recurse is false when calling dev_get_port_parent_id.
*/
if (!netdev->netdev_ops->ndo_get_port_parent_id &&
!netdev->devlink_port)
return -EOPNOTSUPP;
if (!rtnl_trylock())
return restart_syscall();
ret = sysfs_rtnl_lock(&dev->kobj, &attr->attr, netdev);
if (ret)
return ret;
if (dev_isalive(netdev)) {
struct netdev_phys_item_id ppid = { };
ret = dev_get_port_parent_id(netdev, &ppid, false);
if (!ret)
ret = sysfs_emit(buf, "%*phN\n", ppid.id_len, ppid.id);
ret = dev_get_port_parent_id(netdev, &ppid, false);
if (!ret)
ret = sysfs_emit(buf, "%*phN\n", ppid.id_len, ppid.id);
}
rtnl_unlock();
return ret;
@ -1108,7 +1188,6 @@ static void rx_queue_get_ownership(const struct kobject *kobj,
static const struct kobj_type rx_queue_ktype = {
.sysfs_ops = &rx_queue_sysfs_ops,
.release = rx_queue_release,
.default_groups = rx_queue_default_groups,
.namespace = rx_queue_namespace,
.get_ownership = rx_queue_get_ownership,
};
@ -1131,6 +1210,22 @@ static int rx_queue_add_kobject(struct net_device *dev, int index)
struct kobject *kobj = &queue->kobj;
int error = 0;
/* Rx queues are cleared in rx_queue_release to allow later
* re-registration. This is triggered when their kobj refcount is
* dropped.
*
* If a queue is removed while both a read (or write) operation and a
* the re-addition of the same queue are pending (waiting on rntl_lock)
* it might happen that the re-addition will execute before the read,
* making the initial removal to never happen (queue's kobj refcount
* won't drop enough because of the pending read). In such rare case,
* return to allow the removal operation to complete.
*/
if (unlikely(kobj->state_initialized)) {
netdev_warn_once(dev, "Cannot re-add rx queues before their removal completed");
return -EAGAIN;
}
/* Kobject_put later will trigger rx_queue_release call which
* decreases dev refcount: Take that reference here
*/
@ -1142,20 +1237,27 @@ static int rx_queue_add_kobject(struct net_device *dev, int index)
if (error)
goto err;
queue->groups = rx_queue_default_groups;
error = sysfs_create_groups(kobj, queue->groups);
if (error)
goto err;
if (dev->sysfs_rx_queue_group) {
error = sysfs_create_group(kobj, dev->sysfs_rx_queue_group);
if (error)
goto err;
goto err_default_groups;
}
error = rx_queue_default_mask(dev, queue);
if (error)
goto err;
goto err_default_groups;
kobject_uevent(kobj, KOBJ_ADD);
return error;
err_default_groups:
sysfs_remove_groups(kobj, queue->groups);
err:
kobject_put(kobj);
return error;
@ -1200,12 +1302,14 @@ net_rx_queue_update_kobjects(struct net_device *dev, int old_num, int new_num)
}
while (--i >= new_num) {
struct kobject *kobj = &dev->_rx[i].kobj;
struct netdev_rx_queue *queue = &dev->_rx[i];
struct kobject *kobj = &queue->kobj;
if (!refcount_read(&dev_net(dev)->ns.count))
kobj->uevent_suppress = 1;
if (dev->sysfs_rx_queue_group)
sysfs_remove_group(kobj, dev->sysfs_rx_queue_group);
sysfs_remove_groups(kobj, queue->groups);
kobject_put(kobj);
}
@ -1244,9 +1348,11 @@ static int net_rx_queue_change_owner(struct net_device *dev, int num,
*/
struct netdev_queue_attribute {
struct attribute attr;
ssize_t (*show)(struct netdev_queue *queue, char *buf);
ssize_t (*store)(struct netdev_queue *queue,
const char *buf, size_t len);
ssize_t (*show)(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, char *buf);
ssize_t (*store)(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, const char *buf,
size_t len);
};
#define to_netdev_queue_attr(_attr) \
container_of(_attr, struct netdev_queue_attribute, attr)
@ -1263,7 +1369,7 @@ static ssize_t netdev_queue_attr_show(struct kobject *kobj,
if (!attribute->show)
return -EIO;
return attribute->show(queue, buf);
return attribute->show(kobj, attr, queue, buf);
}
static ssize_t netdev_queue_attr_store(struct kobject *kobj,
@ -1277,7 +1383,7 @@ static ssize_t netdev_queue_attr_store(struct kobject *kobj,
if (!attribute->store)
return -EIO;
return attribute->store(queue, buf, count);
return attribute->store(kobj, attr, queue, buf, count);
}
static const struct sysfs_ops netdev_queue_sysfs_ops = {
@ -1285,7 +1391,8 @@ static const struct sysfs_ops netdev_queue_sysfs_ops = {
.store = netdev_queue_attr_store,
};
static ssize_t tx_timeout_show(struct netdev_queue *queue, char *buf)
static ssize_t tx_timeout_show(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, char *buf)
{
unsigned long trans_timeout = atomic_long_read(&queue->trans_timeout);
@ -1303,18 +1410,18 @@ static unsigned int get_netdev_queue_index(struct netdev_queue *queue)
return i;
}
static ssize_t traffic_class_show(struct netdev_queue *queue,
char *buf)
static ssize_t traffic_class_show(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, char *buf)
{
struct net_device *dev = queue->dev;
int num_tc, tc;
int index;
int num_tc, tc, index, ret;
if (!netif_is_multiqueue(dev))
return -ENOENT;
if (!rtnl_trylock())
return restart_syscall();
ret = sysfs_rtnl_lock(kobj, attr, queue->dev);
if (ret)
return ret;
index = get_netdev_queue_index(queue);
@ -1341,24 +1448,25 @@ static ssize_t traffic_class_show(struct netdev_queue *queue,
}
#ifdef CONFIG_XPS
static ssize_t tx_maxrate_show(struct netdev_queue *queue,
char *buf)
static ssize_t tx_maxrate_show(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, char *buf)
{
return sysfs_emit(buf, "%lu\n", queue->tx_maxrate);
}
static ssize_t tx_maxrate_store(struct netdev_queue *queue,
const char *buf, size_t len)
static ssize_t tx_maxrate_store(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, const char *buf,
size_t len)
{
struct net_device *dev = queue->dev;
int err, index = get_netdev_queue_index(queue);
struct net_device *dev = queue->dev;
u32 rate = 0;
if (!capable(CAP_NET_ADMIN))
return -EPERM;
/* The check is also done later; this helps returning early without
* hitting the trylock/restart below.
* hitting the locking section below.
*/
if (!dev->netdev_ops->ndo_set_tx_maxrate)
return -EOPNOTSUPP;
@ -1367,18 +1475,21 @@ static ssize_t tx_maxrate_store(struct netdev_queue *queue,
if (err < 0)
return err;
if (!rtnl_trylock())
return restart_syscall();
err = sysfs_rtnl_lock(kobj, attr, dev);
if (err)
return err;
err = -EOPNOTSUPP;
if (dev->netdev_ops->ndo_set_tx_maxrate)
err = dev->netdev_ops->ndo_set_tx_maxrate(dev, index, rate);
rtnl_unlock();
if (!err) {
queue->tx_maxrate = rate;
rtnl_unlock();
return len;
}
rtnl_unlock();
return err;
}
@ -1422,16 +1533,17 @@ static ssize_t bql_set(const char *buf, const size_t count,
return count;
}
static ssize_t bql_show_hold_time(struct netdev_queue *queue,
char *buf)
static ssize_t bql_show_hold_time(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, char *buf)
{
struct dql *dql = &queue->dql;
return sysfs_emit(buf, "%u\n", jiffies_to_msecs(dql->slack_hold_time));
}
static ssize_t bql_set_hold_time(struct netdev_queue *queue,
const char *buf, size_t len)
static ssize_t bql_set_hold_time(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, const char *buf,
size_t len)
{
struct dql *dql = &queue->dql;
unsigned int value;
@ -1450,15 +1562,17 @@ static struct netdev_queue_attribute bql_hold_time_attribute __ro_after_init
= __ATTR(hold_time, 0644,
bql_show_hold_time, bql_set_hold_time);
static ssize_t bql_show_stall_thrs(struct netdev_queue *queue, char *buf)
static ssize_t bql_show_stall_thrs(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, char *buf)
{
struct dql *dql = &queue->dql;
return sysfs_emit(buf, "%u\n", jiffies_to_msecs(dql->stall_thrs));
}
static ssize_t bql_set_stall_thrs(struct netdev_queue *queue,
const char *buf, size_t len)
static ssize_t bql_set_stall_thrs(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, const char *buf,
size_t len)
{
struct dql *dql = &queue->dql;
unsigned int value;
@ -1484,13 +1598,15 @@ static ssize_t bql_set_stall_thrs(struct netdev_queue *queue,
static struct netdev_queue_attribute bql_stall_thrs_attribute __ro_after_init =
__ATTR(stall_thrs, 0644, bql_show_stall_thrs, bql_set_stall_thrs);
static ssize_t bql_show_stall_max(struct netdev_queue *queue, char *buf)
static ssize_t bql_show_stall_max(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, char *buf)
{
return sysfs_emit(buf, "%u\n", READ_ONCE(queue->dql.stall_max));
}
static ssize_t bql_set_stall_max(struct netdev_queue *queue,
const char *buf, size_t len)
static ssize_t bql_set_stall_max(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, const char *buf,
size_t len)
{
WRITE_ONCE(queue->dql.stall_max, 0);
return len;
@ -1499,7 +1615,8 @@ static ssize_t bql_set_stall_max(struct netdev_queue *queue,
static struct netdev_queue_attribute bql_stall_max_attribute __ro_after_init =
__ATTR(stall_max, 0644, bql_show_stall_max, bql_set_stall_max);
static ssize_t bql_show_stall_cnt(struct netdev_queue *queue, char *buf)
static ssize_t bql_show_stall_cnt(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, char *buf)
{
struct dql *dql = &queue->dql;
@ -1509,8 +1626,8 @@ static ssize_t bql_show_stall_cnt(struct netdev_queue *queue, char *buf)
static struct netdev_queue_attribute bql_stall_cnt_attribute __ro_after_init =
__ATTR(stall_cnt, 0444, bql_show_stall_cnt, NULL);
static ssize_t bql_show_inflight(struct netdev_queue *queue,
char *buf)
static ssize_t bql_show_inflight(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, char *buf)
{
struct dql *dql = &queue->dql;
@ -1521,13 +1638,16 @@ static struct netdev_queue_attribute bql_inflight_attribute __ro_after_init =
__ATTR(inflight, 0444, bql_show_inflight, NULL);
#define BQL_ATTR(NAME, FIELD) \
static ssize_t bql_show_ ## NAME(struct netdev_queue *queue, \
char *buf) \
static ssize_t bql_show_ ## NAME(struct kobject *kobj, \
struct attribute *attr, \
struct netdev_queue *queue, char *buf) \
{ \
return bql_show(buf, queue->dql.FIELD); \
} \
\
static ssize_t bql_set_ ## NAME(struct netdev_queue *queue, \
static ssize_t bql_set_ ## NAME(struct kobject *kobj, \
struct attribute *attr, \
struct netdev_queue *queue, \
const char *buf, size_t len) \
{ \
return bql_set(buf, len, &queue->dql.FIELD); \
@ -1613,19 +1733,21 @@ out_no_maps:
return len < PAGE_SIZE ? len : -EINVAL;
}
static ssize_t xps_cpus_show(struct netdev_queue *queue, char *buf)
static ssize_t xps_cpus_show(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, char *buf)
{
struct net_device *dev = queue->dev;
unsigned int index;
int len, tc;
int len, tc, ret;
if (!netif_is_multiqueue(dev))
return -ENOENT;
index = get_netdev_queue_index(queue);
if (!rtnl_trylock())
return restart_syscall();
ret = sysfs_rtnl_lock(kobj, attr, queue->dev);
if (ret)
return ret;
/* If queue belongs to subordinate dev use its map */
dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
@ -1636,18 +1758,21 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue, char *buf)
return -EINVAL;
}
/* Make sure the subordinate device can't be freed */
get_device(&dev->dev);
/* Increase the net device refcnt to make sure it won't be freed while
* xps_queue_show is running.
*/
dev_hold(dev);
rtnl_unlock();
len = xps_queue_show(dev, index, tc, buf, XPS_CPUS);
put_device(&dev->dev);
dev_put(dev);
return len;
}
static ssize_t xps_cpus_store(struct netdev_queue *queue,
const char *buf, size_t len)
static ssize_t xps_cpus_store(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, const char *buf,
size_t len)
{
struct net_device *dev = queue->dev;
unsigned int index;
@ -1671,9 +1796,10 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue,
return err;
}
if (!rtnl_trylock()) {
err = sysfs_rtnl_lock(kobj, attr, dev);
if (err) {
free_cpumask_var(mask);
return restart_syscall();
return err;
}
err = netif_set_xps_queue(dev, mask, index);
@ -1687,26 +1813,34 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue,
static struct netdev_queue_attribute xps_cpus_attribute __ro_after_init
= __ATTR_RW(xps_cpus);
static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
static ssize_t xps_rxqs_show(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, char *buf)
{
struct net_device *dev = queue->dev;
unsigned int index;
int tc;
int tc, ret;
index = get_netdev_queue_index(queue);
if (!rtnl_trylock())
return restart_syscall();
ret = sysfs_rtnl_lock(kobj, attr, dev);
if (ret)
return ret;
tc = netdev_txq_to_tc(dev, index);
rtnl_unlock();
if (tc < 0)
return -EINVAL;
return xps_queue_show(dev, index, tc, buf, XPS_RXQS);
/* Increase the net device refcnt to make sure it won't be freed while
* xps_queue_show is running.
*/
dev_hold(dev);
rtnl_unlock();
ret = tc >= 0 ? xps_queue_show(dev, index, tc, buf, XPS_RXQS) : -EINVAL;
dev_put(dev);
return ret;
}
static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
static ssize_t xps_rxqs_store(struct kobject *kobj, struct attribute *attr,
struct netdev_queue *queue, const char *buf,
size_t len)
{
struct net_device *dev = queue->dev;
@ -1730,9 +1864,10 @@ static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
return err;
}
if (!rtnl_trylock()) {
err = sysfs_rtnl_lock(kobj, attr, dev);
if (err) {
bitmap_free(mask);
return restart_syscall();
return err;
}
cpus_read_lock();
@ -1792,7 +1927,6 @@ static void netdev_queue_get_ownership(const struct kobject *kobj,
static const struct kobj_type netdev_queue_ktype = {
.sysfs_ops = &netdev_queue_sysfs_ops,
.release = netdev_queue_release,
.default_groups = netdev_queue_default_groups,
.namespace = netdev_queue_namespace,
.get_ownership = netdev_queue_get_ownership,
};
@ -1811,6 +1945,22 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index)
struct kobject *kobj = &queue->kobj;
int error = 0;
/* Tx queues are cleared in netdev_queue_release to allow later
* re-registration. This is triggered when their kobj refcount is
* dropped.
*
* If a queue is removed while both a read (or write) operation and a
* the re-addition of the same queue are pending (waiting on rntl_lock)
* it might happen that the re-addition will execute before the read,
* making the initial removal to never happen (queue's kobj refcount
* won't drop enough because of the pending read). In such rare case,
* return to allow the removal operation to complete.
*/
if (unlikely(kobj->state_initialized)) {
netdev_warn_once(dev, "Cannot re-add tx queues before their removal completed");
return -EAGAIN;
}
/* Kobject_put later will trigger netdev_queue_release call
* which decreases dev refcount: Take that reference here
*/
@ -1822,15 +1972,22 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index)
if (error)
goto err;
queue->groups = netdev_queue_default_groups;
error = sysfs_create_groups(kobj, queue->groups);
if (error)
goto err;
if (netdev_uses_bql(dev)) {
error = sysfs_create_group(kobj, &dql_group);
if (error)
goto err;
goto err_default_groups;
}
kobject_uevent(kobj, KOBJ_ADD);
return 0;
err_default_groups:
sysfs_remove_groups(kobj, queue->groups);
err:
kobject_put(kobj);
return error;
@ -1885,6 +2042,7 @@ netdev_queue_update_kobjects(struct net_device *dev, int old_num, int new_num)
if (netdev_uses_bql(dev))
sysfs_remove_group(&queue->kobj, &dql_group);
sysfs_remove_groups(&queue->kobj, queue->groups);
kobject_put(&queue->kobj);
}

View File

@ -10,6 +10,7 @@
#include <net/sock.h>
#include <net/xdp.h>
#include <net/xdp_sock.h>
#include <net/page_pool/memory_provider.h>
#include "dev.h"
#include "devmem.h"
@ -368,7 +369,7 @@ static int
netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
u32 q_idx, u32 q_type, const struct genl_info *info)
{
struct net_devmem_dmabuf_binding *binding;
struct pp_memory_provider_params *params;
struct netdev_rx_queue *rxq;
struct netdev_queue *txq;
void *hdr;
@ -385,15 +386,15 @@ netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
switch (q_type) {
case NETDEV_QUEUE_TYPE_RX:
rxq = __netif_get_rx_queue(netdev, q_idx);
if (rxq->napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID,
rxq->napi->napi_id))
goto nla_put_failure;
binding = rxq->mp_params.mp_priv;
if (binding &&
nla_put_u32(rsp, NETDEV_A_QUEUE_DMABUF, binding->id))
params = &rxq->mp_params;
if (params->mp_ops &&
params->mp_ops->nl_fill(params->mp_priv, rsp, rxq))
goto nla_put_failure;
break;
case NETDEV_QUEUE_TYPE_TX:
txq = netdev_get_tx_queue(netdev, q_idx);

View File

@ -3,6 +3,7 @@
#include <linux/netdevice.h>
#include <net/netdev_queues.h>
#include <net/netdev_rx_queue.h>
#include <net/page_pool/memory_provider.h>
#include "page_pool_priv.h"
@ -80,3 +81,71 @@ err_free_new_mem:
return err;
}
EXPORT_SYMBOL_NS_GPL(netdev_rx_queue_restart, "NETDEV_INTERNAL");
static int __net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
struct pp_memory_provider_params *p)
{
struct netdev_rx_queue *rxq;
int ret;
if (ifq_idx >= dev->real_num_rx_queues)
return -EINVAL;
ifq_idx = array_index_nospec(ifq_idx, dev->real_num_rx_queues);
rxq = __netif_get_rx_queue(dev, ifq_idx);
if (rxq->mp_params.mp_ops)
return -EEXIST;
rxq->mp_params = *p;
ret = netdev_rx_queue_restart(dev, ifq_idx);
if (ret) {
rxq->mp_params.mp_ops = NULL;
rxq->mp_params.mp_priv = NULL;
}
return ret;
}
int net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
struct pp_memory_provider_params *p)
{
int ret;
rtnl_lock();
ret = __net_mp_open_rxq(dev, ifq_idx, p);
rtnl_unlock();
return ret;
}
static void __net_mp_close_rxq(struct net_device *dev, unsigned ifq_idx,
struct pp_memory_provider_params *old_p)
{
struct netdev_rx_queue *rxq;
if (WARN_ON_ONCE(ifq_idx >= dev->real_num_rx_queues))
return;
rxq = __netif_get_rx_queue(dev, ifq_idx);
/* Callers holding a netdev ref may get here after we already
* went thru shutdown via dev_memory_provider_uninstall().
*/
if (dev->reg_state > NETREG_REGISTERED &&
!rxq->mp_params.mp_ops)
return;
if (WARN_ON_ONCE(rxq->mp_params.mp_ops != old_p->mp_ops ||
rxq->mp_params.mp_priv != old_p->mp_priv))
return;
rxq->mp_params.mp_ops = NULL;
rxq->mp_params.mp_priv = NULL;
WARN_ON(netdev_rx_queue_restart(dev, ifq_idx));
}
void net_mp_close_rxq(struct net_device *dev, unsigned ifq_idx,
struct pp_memory_provider_params *old_p)
{
rtnl_lock();
__net_mp_close_rxq(dev, ifq_idx, old_p);
rtnl_unlock();
}

View File

@ -13,6 +13,7 @@
#include <net/netdev_rx_queue.h>
#include <net/page_pool/helpers.h>
#include <net/page_pool/memory_provider.h>
#include <net/xdp.h>
#include <linux/dma-direction.h>
@ -285,13 +286,19 @@ static int page_pool_init(struct page_pool *pool,
rxq = __netif_get_rx_queue(pool->slow.netdev,
pool->slow.queue_idx);
pool->mp_priv = rxq->mp_params.mp_priv;
pool->mp_ops = rxq->mp_params.mp_ops;
}
if (pool->mp_priv) {
if (pool->mp_ops) {
if (!pool->dma_map || !pool->dma_sync)
return -EOPNOTSUPP;
err = mp_dmabuf_devmem_init(pool);
if (WARN_ON(!is_kernel_rodata((unsigned long)pool->mp_ops))) {
err = -EFAULT;
goto free_ptr_ring;
}
err = pool->mp_ops->init(pool);
if (err) {
pr_warn("%s() mem-provider init failed %d\n", __func__,
err);
@ -587,8 +594,8 @@ netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp)
return netmem;
/* Slow-path: cache empty, do real allocation */
if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv)
netmem = mp_dmabuf_devmem_alloc_netmems(pool, gfp);
if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
netmem = pool->mp_ops->alloc_netmems(pool, gfp);
else
netmem = __page_pool_alloc_pages_slow(pool, gfp);
return netmem;
@ -679,8 +686,8 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
bool put;
put = true;
if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv)
put = mp_dmabuf_devmem_release_page(pool, netmem);
if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
put = pool->mp_ops->release_netmem(pool, netmem);
else
__page_pool_release_page_dma(pool, netmem);
@ -1048,8 +1055,8 @@ static void __page_pool_destroy(struct page_pool *pool)
page_pool_unlist(pool);
page_pool_uninit(pool);
if (pool->mp_priv) {
mp_dmabuf_devmem_destroy(pool);
if (pool->mp_ops) {
pool->mp_ops->destroy(pool);
static_branch_dec(&page_pool_mem_providers);
}
@ -1190,3 +1197,31 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
}
}
EXPORT_SYMBOL(page_pool_update_nid);
bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr)
{
return page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), addr);
}
/* Associate a niov with a page pool. Should follow with a matching
* net_mp_niov_clear_page_pool()
*/
void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov)
{
netmem_ref netmem = net_iov_to_netmem(niov);
page_pool_set_pp_info(pool, netmem);
pool->pages_state_hold_cnt++;
trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt);
}
/* Disassociate a niov from a page pool. Should only be used in the
* ->release_netmem() path.
*/
void net_mp_niov_clear_page_pool(struct net_iov *niov)
{
netmem_ref netmem = net_iov_to_netmem(niov);
page_pool_clear_pp_info(netmem);
}

View File

@ -8,9 +8,9 @@
#include <net/netdev_rx_queue.h>
#include <net/page_pool/helpers.h>
#include <net/page_pool/types.h>
#include <net/page_pool/memory_provider.h>
#include <net/sock.h>
#include "devmem.h"
#include "page_pool_priv.h"
#include "netdev-genl-gen.h"
@ -216,7 +216,6 @@ static int
page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
const struct genl_info *info)
{
struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
size_t inflight, refsz;
unsigned int napi_id;
void *hdr;
@ -249,7 +248,7 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
pool->user.detach_time))
goto err_cancel;
if (binding && nla_put_u32(rsp, NETDEV_A_PAGE_POOL_DMABUF, binding->id))
if (pool->mp_ops && pool->mp_ops->nl_fill(pool->mp_priv, rsp, NULL))
goto err_cancel;
genlmsg_end(rsp, hdr);
@ -356,7 +355,7 @@ void page_pool_unlist(struct page_pool *pool)
int page_pool_check_memory_provider(struct net_device *dev,
struct netdev_rx_queue *rxq)
{
struct net_devmem_dmabuf_binding *binding = rxq->mp_params.mp_priv;
void *binding = rxq->mp_params.mp_priv;
struct page_pool *pool;
struct hlist_node *n;

View File

@ -80,6 +80,11 @@ void rtnl_lock(void)
}
EXPORT_SYMBOL(rtnl_lock);
int rtnl_lock_interruptible(void)
{
return mutex_lock_interruptible(&rtnl_mutex);
}
int rtnl_lock_killable(void)
{
return mutex_lock_killable(&rtnl_mutex);

View File

@ -214,6 +214,24 @@ const char link_mode_names[][ETH_GSTRING_LEN] = {
__DEFINE_LINK_MODE_NAME(10, T1S, Half),
__DEFINE_LINK_MODE_NAME(10, T1S_P2MP, Half),
__DEFINE_LINK_MODE_NAME(10, T1BRR, Full),
__DEFINE_LINK_MODE_NAME(200000, CR, Full),
__DEFINE_LINK_MODE_NAME(200000, KR, Full),
__DEFINE_LINK_MODE_NAME(200000, DR, Full),
__DEFINE_LINK_MODE_NAME(200000, DR_2, Full),
__DEFINE_LINK_MODE_NAME(200000, SR, Full),
__DEFINE_LINK_MODE_NAME(200000, VR, Full),
__DEFINE_LINK_MODE_NAME(400000, CR2, Full),
__DEFINE_LINK_MODE_NAME(400000, KR2, Full),
__DEFINE_LINK_MODE_NAME(400000, DR2, Full),
__DEFINE_LINK_MODE_NAME(400000, DR2_2, Full),
__DEFINE_LINK_MODE_NAME(400000, SR2, Full),
__DEFINE_LINK_MODE_NAME(400000, VR2, Full),
__DEFINE_LINK_MODE_NAME(800000, CR4, Full),
__DEFINE_LINK_MODE_NAME(800000, KR4, Full),
__DEFINE_LINK_MODE_NAME(800000, DR4, Full),
__DEFINE_LINK_MODE_NAME(800000, DR4_2, Full),
__DEFINE_LINK_MODE_NAME(800000, SR4, Full),
__DEFINE_LINK_MODE_NAME(800000, VR4, Full),
};
static_assert(ARRAY_SIZE(link_mode_names) == __ETHTOOL_LINK_MODE_MASK_NBITS);
@ -222,8 +240,11 @@ static_assert(ARRAY_SIZE(link_mode_names) == __ETHTOOL_LINK_MODE_MASK_NBITS);
#define __LINK_MODE_LANES_CR4 4
#define __LINK_MODE_LANES_CR8 8
#define __LINK_MODE_LANES_DR 1
#define __LINK_MODE_LANES_DR_2 1
#define __LINK_MODE_LANES_DR2 2
#define __LINK_MODE_LANES_DR2_2 2
#define __LINK_MODE_LANES_DR4 4
#define __LINK_MODE_LANES_DR4_2 4
#define __LINK_MODE_LANES_DR8 8
#define __LINK_MODE_LANES_KR 1
#define __LINK_MODE_LANES_KR2 2
@ -252,6 +273,9 @@ static_assert(ARRAY_SIZE(link_mode_names) == __ETHTOOL_LINK_MODE_MASK_NBITS);
#define __LINK_MODE_LANES_T1L 1
#define __LINK_MODE_LANES_T1S 1
#define __LINK_MODE_LANES_T1S_P2MP 1
#define __LINK_MODE_LANES_VR 1
#define __LINK_MODE_LANES_VR2 2
#define __LINK_MODE_LANES_VR4 4
#define __LINK_MODE_LANES_VR8 8
#define __LINK_MODE_LANES_DR8_2 8
#define __LINK_MODE_LANES_T1BRR 1
@ -379,6 +403,24 @@ const struct link_mode_info link_mode_params[] = {
__DEFINE_LINK_MODE_PARAMS(10, T1S, Half),
__DEFINE_LINK_MODE_PARAMS(10, T1S_P2MP, Half),
__DEFINE_LINK_MODE_PARAMS(10, T1BRR, Full),
__DEFINE_LINK_MODE_PARAMS(200000, CR, Full),
__DEFINE_LINK_MODE_PARAMS(200000, KR, Full),
__DEFINE_LINK_MODE_PARAMS(200000, DR, Full),
__DEFINE_LINK_MODE_PARAMS(200000, DR_2, Full),
__DEFINE_LINK_MODE_PARAMS(200000, SR, Full),
__DEFINE_LINK_MODE_PARAMS(200000, VR, Full),
__DEFINE_LINK_MODE_PARAMS(400000, CR2, Full),
__DEFINE_LINK_MODE_PARAMS(400000, KR2, Full),
__DEFINE_LINK_MODE_PARAMS(400000, DR2, Full),
__DEFINE_LINK_MODE_PARAMS(400000, DR2_2, Full),
__DEFINE_LINK_MODE_PARAMS(400000, SR2, Full),
__DEFINE_LINK_MODE_PARAMS(400000, VR2, Full),
__DEFINE_LINK_MODE_PARAMS(800000, CR4, Full),
__DEFINE_LINK_MODE_PARAMS(800000, KR4, Full),
__DEFINE_LINK_MODE_PARAMS(800000, DR4, Full),
__DEFINE_LINK_MODE_PARAMS(800000, DR4_2, Full),
__DEFINE_LINK_MODE_PARAMS(800000, SR4, Full),
__DEFINE_LINK_MODE_PARAMS(800000, VR4, Full),
};
static_assert(ARRAY_SIZE(link_mode_params) == __ETHTOOL_LINK_MODE_MASK_NBITS);

View File

@ -141,7 +141,6 @@ static int ipgre_err(struct sk_buff *skb, u32 info,
const struct iphdr *iph;
const int type = icmp_hdr(skb)->type;
const int code = icmp_hdr(skb)->code;
unsigned int data_len = 0;
struct ip_tunnel *t;
if (tpi->proto == htons(ETH_P_TEB))
@ -182,7 +181,6 @@ static int ipgre_err(struct sk_buff *skb, u32 info,
case ICMP_TIME_EXCEEDED:
if (code != ICMP_EXC_TTL)
return 0;
data_len = icmp_hdr(skb)->un.reserved[1] * 4; /* RFC 4884 4.1 */
break;
case ICMP_REDIRECT:
@ -190,10 +188,16 @@ static int ipgre_err(struct sk_buff *skb, u32 info,
}
#if IS_ENABLED(CONFIG_IPV6)
if (tpi->proto == htons(ETH_P_IPV6) &&
!ip6_err_gen_icmpv6_unreach(skb, iph->ihl * 4 + tpi->hdr_len,
type, data_len))
return 0;
if (tpi->proto == htons(ETH_P_IPV6)) {
unsigned int data_len = 0;
if (type == ICMP_TIME_EXCEEDED)
data_len = icmp_hdr(skb)->un.reserved[1] * 4; /* RFC 4884 4.1 */
if (!ip6_err_gen_icmpv6_unreach(skb, iph->ihl * 4 + tpi->hdr_len,
type, data_len))
return 0;
}
#endif
if (t->parms.iph.daddr == 0 ||

View File

@ -2493,6 +2493,11 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
}
niov = skb_frag_net_iov(frag);
if (!net_is_devmem_iov(niov)) {
err = -ENODEV;
goto out;
}
end = start + skb_frag_size(frag);
copy = end - offset;
@ -2511,7 +2516,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
/* Will perform the exchange later */
dmabuf_cmsg.frag_token = tcp_xa_pool.tokens[tcp_xa_pool.idx];
dmabuf_cmsg.dmabuf_id = net_iov_binding_id(niov);
dmabuf_cmsg.dmabuf_id = net_devmem_iov_binding_id(niov);
offset += copy;
remaining_len -= copy;

View File

@ -86,6 +86,11 @@ enum {
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
enum {
__NETDEV_A_IO_URING_PROVIDER_INFO_MAX,
NETDEV_A_IO_URING_PROVIDER_INFO_MAX = (__NETDEV_A_IO_URING_PROVIDER_INFO_MAX - 1)
};
enum {
NETDEV_A_PAGE_POOL_ID = 1,
NETDEV_A_PAGE_POOL_IFINDEX,
@ -94,6 +99,7 @@ enum {
NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
NETDEV_A_PAGE_POOL_DETACH_TIME,
NETDEV_A_PAGE_POOL_DMABUF,
NETDEV_A_PAGE_POOL_IO_URING,
__NETDEV_A_PAGE_POOL_MAX,
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
@ -136,6 +142,7 @@ enum {
NETDEV_A_QUEUE_TYPE,
NETDEV_A_QUEUE_NAPI_ID,
NETDEV_A_QUEUE_DMABUF,
NETDEV_A_QUEUE_IO_URING,
__NETDEV_A_QUEUE_MAX,
NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)

View File

@ -17,9 +17,11 @@ get_hdr_inc=-D$(1) -include $(UAPI_PATH)/linux/$(2)
CFLAGS_devlink:=$(call get_hdr_inc,_LINUX_DEVLINK_H_,devlink.h)
CFLAGS_dpll:=$(call get_hdr_inc,_LINUX_DPLL_H,dpll.h)
CFLAGS_ethtool:=$(call get_hdr_inc,_LINUX_ETHTOOL_H,ethtool.h) \
$(call get_hdr_inc,_LINUX_ETHTOOL_NETLINK_H_,ethtool_netlink.h)
$(call get_hdr_inc,_LINUX_ETHTOOL_NETLINK_H_,ethtool_netlink.h) \
$(call get_hdr_inc,_LINUX_ETHTOOL_NETLINK_GENERATED_H,ethtool_netlink_generated.h)
CFLAGS_handshake:=$(call get_hdr_inc,_LINUX_HANDSHAKE_H,handshake.h)
CFLAGS_mptcp_pm:=$(call get_hdr_inc,_LINUX_MPTCP_PM_H,mptcp_pm.h)
CFLAGS_net_shaper:=$(call get_hdr_inc,_LINUX_NET_SHAPER_H,net_shaper.h)
CFLAGS_netdev:=$(call get_hdr_inc,_LINUX_NETDEV_H,netdev.h)
CFLAGS_nlctrl:=$(call get_hdr_inc,__LINUX_GENERIC_NETLINK_H,genetlink.h)
CFLAGS_nfsd:=$(call get_hdr_inc,_LINUX_NFSD_NETLINK_H,nfsd_netlink.h)

View File

@ -100,7 +100,7 @@ class Type(SpecAttr):
if isinstance(value, int):
return value
if value in self.family.consts:
raise Exception("Resolving family constants not implemented, yet")
return self.family.consts[value]["value"]
return limit_to_number(value)
def get_limit_str(self, limit, default=None, suffix=''):
@ -110,6 +110,9 @@ class Type(SpecAttr):
if isinstance(value, int):
return str(value) + suffix
if value in self.family.consts:
const = self.family.consts[value]
if const.get('header'):
return c_upper(value)
return c_upper(f"{self.family['name']}-{value}")
return c_upper(value)
@ -2549,6 +2552,9 @@ def render_uapi(family, cw):
defines = []
for const in family['definitions']:
if const.get('header'):
continue
if const['type'] != 'const':
cw.writes_defines(defines)
defines = []

View File

@ -7,6 +7,7 @@ TEST_INCLUDES := $(wildcard lib/py/*.py) \
TEST_PROGS := \
netcons_basic.sh \
netcons_fragmented_msg.sh \
netcons_overflow.sh \
ping.py \
queues.py \

View File

@ -1 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
iou-zcrx
ncdevmem

View File

@ -1,5 +1,7 @@
# SPDX-License-Identifier: GPL-2.0+ OR MIT
TEST_GEN_FILES = iou-zcrx
TEST_PROGS = \
csum.py \
devlink_port_split.py \
@ -10,6 +12,7 @@ TEST_PROGS = \
ethtool_rmon.sh \
hw_stats_l3.sh \
hw_stats_l3_gre.sh \
iou-zcrx.py \
loopback.sh \
nic_link_layer.py \
nic_performance.py \
@ -38,3 +41,5 @@ include ../../../lib.mk
# YNL build
YNL_GENS := ethtool netdev
include ../../../net/ynl.mk
$(OUTPUT)/iou-zcrx: LDLIBS += -luring

Some files were not shown because too many files have changed in this diff Show More