Linux kernel source tree
 
 
 
 
 
 
Go to file
Jesper Dangaard Brouer dc82a33297 veth: apply qdisc backpressure on full ptr_ring to reduce TX drops
In production, we're seeing TX drops on veth devices when the ptr_ring
fills up. This can occur when NAPI mode is enabled, though it's
relatively rare. However, with threaded NAPI - which we use in
production - the drops become significantly more frequent.

The underlying issue is that with threaded NAPI, the consumer often runs
on a different CPU than the producer. This increases the likelihood of
the ring filling up before the consumer gets scheduled, especially under
load, leading to drops in veth_xmit() (ndo_start_xmit()).

This patch introduces backpressure by returning NETDEV_TX_BUSY when the
ring is full, signaling the qdisc layer to requeue the packet. The txq
(netdev queue) is stopped in this condition and restarted once
veth_poll() drains entries from the ring, ensuring coordination between
NAPI and qdisc.

Backpressure is only enabled when a qdisc is attached. Without a qdisc,
the driver retains its original behavior - dropping packets immediately
when the ring is full. This avoids unexpected behavior changes in setups
without a configured qdisc.

With a qdisc in place (e.g. fq, sfq) this allows Active Queue Management
(AQM) to fairly schedule packets across flows and reduce collateral
damage from elephant flows.

A known limitation of this approach is that the full ring sits in front
of the qdisc layer, effectively forming a FIFO buffer that introduces
base latency. While AQM still improves fairness and mitigates flow
dominance, the latency impact is measurable.

In hardware drivers, this issue is typically addressed using BQL (Byte
Queue Limits), which tracks in-flight bytes needed based on physical link
rate. However, for virtual drivers like veth, there is no fixed bandwidth
constraint - the bottleneck is CPU availability and the scheduler's ability
to run the NAPI thread. It is unclear how effective BQL would be in this
context.

This patch serves as a first step toward addressing TX drops. Future work
may explore adapting a BQL-like mechanism to better suit virtual devices
like veth.

Reported-by: Yan Zhai <yan@cloudflare.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Link: https://patch.msgid.link/174559294022.827981.1282809941662942189.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28 14:06:58 -07:00
Documentation rxrpc: Remove deadcode 2025-04-24 17:03:45 -07:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
arch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-04-24 11:20:52 -07:00
block vfs-6.15-rc3.fixes.2 2025-04-19 14:31:08 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto crypto: scomp - Fix off-by-one bug when calculating last page 2025-04-23 09:32:57 +08:00
drivers veth: apply qdisc backpressure on full ptr_ring to reduce TX drops 2025-04-28 14:06:58 -07:00
fs Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-04-24 11:20:52 -07:00
include net: sched: generalize check for no-queue qdisc on TX queue 2025-04-28 14:06:58 -07:00
init Kconfig: switch CONFIG_SYSFS_SYCALL default to n 2025-04-15 10:28:35 +02:00
io_uring io_uring/zcrx: fix late dma unmap for a dead dev 2025-04-18 06:12:10 -06:00
ipc treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
kernel Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-04-24 11:20:52 -07:00
lib hardening fixes for v6.15-rc3 2025-04-18 13:20:20 -07:00
mm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-04-24 11:20:52 -07:00
net tcp: fastopen: pass TFO child indication through getsockopt 2025-04-24 18:21:04 -07:00
rust rust: helpers: Add dma_alloc_attrs() and dma_free_attrs() 2025-04-15 23:06:03 +02:00
samples Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-04-17 12:26:50 -07:00
scripts Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-04-24 11:20:52 -07:00
security Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-04-24 11:20:52 -07:00
sound ASoC: Fixes for v6.15 2025-04-11 15:51:19 +02:00
tools io_uring/zcrx: selftests: add test case for rss ctx 2025-04-25 18:44:10 -07:00
usr kbuild: hdrcheck: fix cross build with clang 2025-03-05 04:06:45 +09:00
virt ARM: 2025-04-08 13:47:55 -07:00
.clang-format clang-format: Update the ForEachMacros list for v6.15-rc1 2025-04-13 11:03:59 +02:00
.clippy.toml rust: give Clippy the minimum supported Rust version 2025-01-10 00:17:25 +01:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: Create intermediate vmlinux build with relocations preserved 2025-03-17 00:29:50 +09:00
.mailmap sound fixes for 6.15-rc3 2025-04-17 10:14:51 -07:00
.rustfmt.toml rust: add `.rustfmt.toml` 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: update SLAB ALLOCATOR maintainers 2025-04-17 20:10:06 -07:00
Kbuild drm: ensure drm headers are self-contained and pass kernel-doc 2025-02-12 10:44:43 +02:00
Kconfig io_uring: Rename KConfig to Kconfig 2025-02-19 14:53:27 -07:00
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-04-24 11:20:52 -07:00
Makefile Fix mis-uses of 'cc-option' for warning disablement 2025-04-23 10:08:29 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.