Linux kernel source tree
 
 
 
 
 
 
Go to file
Gabriel Krisman Bertazi 92835cebab io_uring/sqpoll: Increase task_work submission batch size
Our QA team reported a 10%-23%, throughput reduction on an io_uring
sqpoll testcase doing IO to a null_blk, that I traced back to a
reduction of the device submission queue depth utilization. It turns out
that, after commit af5d68f889 ("io_uring/sqpoll: manage task_work
privately"), we capped the number of task_work entries that can be
completed from a single spin of sqpoll to only 8 entries, before the
sqpoll goes around to (potentially) sleep.  While this cap doesn't drive
the submission side directly, it impacts the completion behavior, which
affects the number of IO queued by fio per sqpoll cycle on the
submission side, and io_uring ends up seeing less ios per sqpoll cycle.
As a result, block layer plugging is less effective, and we see more
time spent inside the block layer in profilings charts, and increased
submission latency measured by fio.

There are other places that have increased overhead once sqpoll sleeps
more often, such as the sqpoll utilization calculation.  But, in this
microbenchmark, those were not representative enough in perf charts, and
their removal didn't yield measurable changes in throughput.  The major
overhead comes from the fact we plug less, and less often, when submitting
to the block layer.

My benchmark is:

fio --ioengine=io_uring --direct=1 --iodepth=128 --runtime=300 --bs=4k \
    --invalidate=1 --time_based  --ramp_time=10 --group_reporting=1 \
    --filename=/dev/nullb0 --name=RandomReads-direct-nullb-sqpoll-4k-1 \
    --rw=randread --numjobs=1 --sqthread_poll

In one machine, tested on top of Linux 6.15-rc1, we have the following
baseline:
  READ: bw=4994MiB/s (5236MB/s), 4994MiB/s-4994MiB/s (5236MB/s-5236MB/s), io=439GiB (471GB), run=90001-90001msec

With this patch:
  READ: bw=5762MiB/s (6042MB/s), 5762MiB/s-5762MiB/s (6042MB/s-6042MB/s), io=506GiB (544GB), run=90001-90001msec

which is a 15% improvement in measured bandwidth.  The average
submission latency is noticeably lowered too.  As measured by
fio:

Baseline:
   lat (usec): min=20, max=241, avg=99.81, stdev=3.38
Patched:
   lat (usec): min=26, max=226, avg=86.48, stdev=4.82

If we look at blktrace, we can also see the plugging behavior is
improved. In the baseline, we end up limited to plugging 8 requests in
the block layer regardless of the device queue depth size, while after
patching we can drive more io, and we manage to utilize the full device
queue.

In the baseline, after a stabilization phase, an ordinary submission
looks like:
  254,0    1    49942     0.016028795  5977  U   N [iou-sqp-5976] 7

After patching, I see consistently more requests per unplug.
  254,0    1     4996     0.001432872  3145  U   N [iou-sqp-3144] 32

Ideally, the cap size would at least be the deep enough to fill the
device queue, but we can't predict that behavior, or assume all IO goes
to a single device, and thus can't guess the ideal batch size.  We also
don't want to let the tw run unbounded, though I'm not sure it would
really be a problem.  Instead, let's just give it a more sensible value
that will allow for more efficient batching.  I've tested with different
cap values, and initially proposed to increase the cap to 1024.  Jens
argued it is too big of a bump and I observed that, with 32, I'm no
longer able to observe this bottleneck in any of my machines.

Fixes: af5d68f889 ("io_uring/sqpoll: manage task_work privately")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20250508181203.3785544-1-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-09 07:56:53 -06:00
Documentation Persistent buffer cleanups and simplifications for v6.15: 2025-04-03 16:09:29 -07:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
arch fixes for bugs caught as part of tree-in-dcache work 2025-04-03 21:12:48 -07:00
block block: don't grab elevator lock during queue initialization 2025-04-03 08:32:03 -06:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto This push fixes reverts the multibuffer hash testing as it is buggy. 2025-04-02 09:14:59 -07:00
drivers fixes for bugs caught as part of tree-in-dcache work 2025-04-03 21:12:48 -07:00
fs 4 ksmbd SMB3 server fixes, all also for stable 2025-04-03 16:18:06 -07:00
include io_uring/zcrx: return ifq id to the user 2025-04-15 07:37:49 -06:00
init ARM and clkdev updates for 6.15-rc1 2025-04-03 12:21:44 -07:00
io_uring io_uring/sqpoll: Increase task_work submission batch size 2025-05-09 07:56:53 -06:00
ipc treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
kernel Persistent buffer cleanups and simplifications for v6.15: 2025-04-03 16:09:29 -07:00
lib One bugfix and a couple of small late-arriving updates. 2025-04-03 11:16:57 -07:00
mm - The 2 patch series "mm: fixes for fallouts from mem_init() cleanup" 2025-04-03 11:10:00 -07:00
net 9p update for 6.15-rc1 2025-04-03 15:35:46 -07:00
rust ARM and clkdev updates for 6.15-rc1 2025-04-03 12:21:44 -07:00
samples tracing fixes for 6.15 2025-04-03 09:52:44 -07:00
scripts ARM and clkdev updates for 6.15-rc1 2025-04-03 12:21:44 -07:00
security mseal sysmap: kernel config and header change 2025-04-01 15:17:14 -07:00
sound These are objtool fixes and updates by Josh Poimboeuf, centered 2025-04-02 10:30:10 -07:00
tools io_uring-6.15-20250403 2025-04-03 15:48:58 -07:00
usr kbuild: hdrcheck: fix cross build with clang 2025-03-05 04:06:45 +09:00
virt ARM: 2025-03-25 14:22:07 -07:00
.clang-format clang-format: Update with v6.11-rc1's `for_each` macro list 2024-08-02 13:20:31 +02:00
.clippy.toml rust: give Clippy the minimum supported Rust version 2025-01-10 00:17:25 +01:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore rust: use host dylib naming convention to support macOS 2025-01-10 01:01:24 +01:00
.mailmap mailmap: add an entry for Nicolas Schier 2025-04-01 15:20:45 -07:00
.rustfmt.toml rust: add `.rustfmt.toml` 2022-09-28 09:02:20 +02:00
COPYING
CREDITS vfs-6.15-rc1.fixes 2025-04-02 16:05:21 -07:00
Kbuild drm: ensure drm headers are self-contained and pass kernel-doc 2025-02-12 10:44:43 +02:00
Kconfig io_uring: Rename KConfig to Kconfig 2025-02-19 14:53:27 -07:00
MAINTAINERS - The 2 patch series "mm: fixes for fallouts from mem_init() cleanup" 2025-04-03 11:10:00 -07:00
Makefile [ Merge note: this pull request depends on you having merged 2025-03-24 22:06:11 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.