Linux kernel source tree
 
 
 
 
 
 
Go to file
Mel Gorman e837456fdc sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals
Reimplement NEXT_BUDDY preemption to take into account the deadline and
eligibility of the wakee with respect to the waker. In the event
multiple buddies could be considered, the one with the earliest deadline
is selected.

Sync wakeups are treated differently to every other type of wakeup. The
WF_SYNC assumption is that the waker promises to sleep in the very near
future. This is violated in enough cases that WF_SYNC should be treated
as a suggestion instead of a contract. If a waker does go to sleep almost
immediately then the delay in wakeup is negligible. In other cases, it's
throttled based on the accumulated runtime of the waker so there is a
chance that some batched wakeups have been issued before preemption.

For all other wakeups, preemption happens if the wakee has a earlier
deadline than the waker and eligible to run.

While many workloads were tested, the two main targets were a modified
dbench4 benchmark and hackbench because the are on opposite ends of the
spectrum -- one prefers throughput by avoiding preemption and the other
relies on preemption.

First is the dbench throughput data even though it is a poor metric but
it is the default metric. The test machine is a 2-socket machine and the
backing filesystem is XFS as a lot of the IO work is dispatched to kernel
threads. It's important to note that these results are not representative
across all machines, especially Zen machines, as different bottlenecks
are exposed on different machines and filesystems.

dbench4 Throughput (misleading but traditional)
                            6.18-rc1               6.18-rc1
                             vanilla   sched-preemptnext-v5
Hmean     1       1268.80 (   0.00%)     1269.74 (   0.07%)
Hmean     4       3971.74 (   0.00%)     3950.59 (  -0.53%)
Hmean     7       5548.23 (   0.00%)     5420.08 (  -2.31%)
Hmean     12      7310.86 (   0.00%)     7165.57 (  -1.99%)
Hmean     21      8874.53 (   0.00%)     9149.04 (   3.09%)
Hmean     30      9361.93 (   0.00%)    10530.04 (  12.48%)
Hmean     48      9540.14 (   0.00%)    11820.40 (  23.90%)
Hmean     79      9208.74 (   0.00%)    12193.79 (  32.42%)
Hmean     110     8573.12 (   0.00%)    11933.72 (  39.20%)
Hmean     141     7791.33 (   0.00%)    11273.90 (  44.70%)
Hmean     160     7666.60 (   0.00%)    10768.72 (  40.46%)

As throughput is misleading, the benchmark is modified to use a short
loadfile report the completion time duration in milliseconds.

dbench4 Loadfile Execution Time
                             6.18-rc1               6.18-rc1
                              vanilla   sched-preemptnext-v5
Amean      1         14.62 (   0.00%)       14.69 (  -0.46%)
Amean      4         18.76 (   0.00%)       18.85 (  -0.45%)
Amean      7         23.71 (   0.00%)       24.38 (  -2.82%)
Amean      12        31.25 (   0.00%)       31.87 (  -1.97%)
Amean      21        45.12 (   0.00%)       43.69 (   3.16%)
Amean      30        61.07 (   0.00%)       54.33 (  11.03%)
Amean      48        95.91 (   0.00%)       77.22 (  19.49%)
Amean      79       163.38 (   0.00%)      123.08 (  24.66%)
Amean      110      243.91 (   0.00%)      175.11 (  28.21%)
Amean      141      343.47 (   0.00%)      239.10 (  30.39%)
Amean      160      401.15 (   0.00%)      283.73 (  29.27%)
Stddev     1          0.52 (   0.00%)        0.51 (   2.45%)
Stddev     4          1.36 (   0.00%)        1.30 (   4.04%)
Stddev     7          1.88 (   0.00%)        1.87 (   0.72%)
Stddev     12         3.06 (   0.00%)        2.45 (  19.83%)
Stddev     21         5.78 (   0.00%)        3.87 (  33.06%)
Stddev     30         9.85 (   0.00%)        5.25 (  46.76%)
Stddev     48        22.31 (   0.00%)        8.64 (  61.27%)
Stddev     79        35.96 (   0.00%)       18.07 (  49.76%)
Stddev     110       59.04 (   0.00%)       30.93 (  47.61%)
Stddev     141       85.38 (   0.00%)       40.93 (  52.06%)
Stddev     160       96.38 (   0.00%)       39.72 (  58.79%)

That is still looking good and the variance is reduced quite a bit.
Finally, fairness is a concern so the next report tracks how many
milliseconds does it take for all clients to complete a workfile. This
one is tricky because dbench makes to effort to synchronise clients so
the durations at benchmark start time differ substantially from typical
runtimes. This problem could be mitigated by warming up the benchmark
for a number of minutes but it's a matter of opinion whether that
counts as an evasion of inconvenient results.

dbench4 All Clients Loadfile Execution Time
                             6.18-rc1               6.18-rc1
                              vanilla   sched-preemptnext-v5
Amean      1         15.06 (   0.00%)       15.07 (  -0.03%)
Amean      4        603.81 (   0.00%)      524.29 (  13.17%)
Amean      7        855.32 (   0.00%)     1331.07 ( -55.62%)
Amean      12      1890.02 (   0.00%)     2323.97 ( -22.96%)
Amean      21      3195.23 (   0.00%)     2009.29 (  37.12%)
Amean      30     13919.53 (   0.00%)     4579.44 (  67.10%)
Amean      48     25246.07 (   0.00%)     5705.46 (  77.40%)
Amean      79     29701.84 (   0.00%)    15509.26 (  47.78%)
Amean      110    22803.03 (   0.00%)    23782.08 (  -4.29%)
Amean      141    36356.07 (   0.00%)    25074.20 (  31.03%)
Amean      160    17046.71 (   0.00%)    13247.62 (  22.29%)
Stddev     1          0.47 (   0.00%)        0.49 (  -3.74%)
Stddev     4        395.24 (   0.00%)      254.18 (  35.69%)
Stddev     7        467.24 (   0.00%)      764.42 ( -63.60%)
Stddev     12      1071.43 (   0.00%)     1395.90 ( -30.28%)
Stddev     21      1694.50 (   0.00%)     1204.89 (  28.89%)
Stddev     30      7945.63 (   0.00%)     2552.59 (  67.87%)
Stddev     48     14339.51 (   0.00%)     3227.55 (  77.49%)
Stddev     79     16620.91 (   0.00%)     8422.15 (  49.33%)
Stddev     110    12912.15 (   0.00%)    13560.95 (  -5.02%)
Stddev     141    20700.13 (   0.00%)    14544.51 (  29.74%)
Stddev     160     9079.16 (   0.00%)     7400.69 (  18.49%)

This is more of a mixed bag but it at least shows that fairness
is not crippled.

The hackbench results are more neutral but this is still important.
It's possible to boost the dbench figures by a large amount but only by
crippling the performance of a workload like hackbench. The WF_SYNC
behaviour is important for these workloads and is why the WF_SYNC
changes are not a separate patch.

hackbench-process-pipes
                          6.18-rc1             6.18-rc1
                             vanilla   sched-preemptnext-v5
Amean     1        0.2657 (   0.00%)      0.2150 (  19.07%)
Amean     4        0.6107 (   0.00%)      0.6060 (   0.76%)
Amean     7        0.7923 (   0.00%)      0.7440 (   6.10%)
Amean     12       1.1500 (   0.00%)      1.1263 (   2.06%)
Amean     21       1.7950 (   0.00%)      1.7987 (  -0.20%)
Amean     30       2.3207 (   0.00%)      2.5053 (  -7.96%)
Amean     48       3.5023 (   0.00%)      3.9197 ( -11.92%)
Amean     79       4.8093 (   0.00%)      5.2247 (  -8.64%)
Amean     110      6.1160 (   0.00%)      6.6650 (  -8.98%)
Amean     141      7.4763 (   0.00%)      7.8973 (  -5.63%)
Amean     172      8.9560 (   0.00%)      9.3593 (  -4.50%)
Amean     203     10.4783 (   0.00%)     10.8347 (  -3.40%)
Amean     234     12.4977 (   0.00%)     13.0177 (  -4.16%)
Amean     265     14.7003 (   0.00%)     15.5630 (  -5.87%)
Amean     296     16.1007 (   0.00%)     17.4023 (  -8.08%)

Processes using pipes are impacted but the variance (not presented) indicates
it's close to noise and the results are not always reproducible. If executed
across multiple reboots, it may show neutral or small gains so the worst
measured results are presented.

Hackbench using sockets is more reliably neutral as the wakeup
mechanisms are different between sockets and pipes.

hackbench-process-sockets
                          6.18-rc1             6.18-rc1
                             vanilla   sched-preemptnext-v2
Amean     1        0.3073 (   0.00%)      0.3263 (  -6.18%)
Amean     4        0.7863 (   0.00%)      0.7930 (  -0.85%)
Amean     7        1.3670 (   0.00%)      1.3537 (   0.98%)
Amean     12       2.1337 (   0.00%)      2.1903 (  -2.66%)
Amean     21       3.4683 (   0.00%)      3.4940 (  -0.74%)
Amean     30       4.7247 (   0.00%)      4.8853 (  -3.40%)
Amean     48       7.6097 (   0.00%)      7.8197 (  -2.76%)
Amean     79      14.7957 (   0.00%)     16.1000 (  -8.82%)
Amean     110     21.3413 (   0.00%)     21.9997 (  -3.08%)
Amean     141     29.0503 (   0.00%)     29.0353 (   0.05%)
Amean     172     36.4660 (   0.00%)     36.1433 (   0.88%)
Amean     203     39.7177 (   0.00%)     40.5910 (  -2.20%)
Amean     234     42.1120 (   0.00%)     43.5527 (  -3.42%)
Amean     265     45.7830 (   0.00%)     50.0560 (  -9.33%)
Amean     296     50.7043 (   0.00%)     54.3657 (  -7.22%)

As schbench has been mentioned in numerous bugs recently, the results
are interesting. A test case that represents the default schbench
behaviour is

schbench Wakeup Latency (usec)
                                       6.18.0-rc1             6.18.0-rc1
                                          vanilla   sched-preemptnext-v5
Amean     Wakeup-50th-80          7.17 (   0.00%)        6.00 (  16.28%)
Amean     Wakeup-90th-80         46.56 (   0.00%)       19.78 (  57.52%)
Amean     Wakeup-99th-80        119.61 (   0.00%)       89.94 (  24.80%)
Amean     Wakeup-99.9th-80     3193.78 (   0.00%)      328.22 (  89.72%)

schbench Requests Per Second (ops/sec)
                                  6.18.0-rc1             6.18.0-rc1
                                     vanilla   sched-preemptnext-v5
Hmean     RPS-20th-80     8900.91 (   0.00%)     9176.78 (   3.10%)
Hmean     RPS-50th-80     8987.41 (   0.00%)     9217.89 (   2.56%)
Hmean     RPS-90th-80     9123.73 (   0.00%)     9273.25 (   1.64%)
Hmean     RPS-max-80      9193.50 (   0.00%)     9301.47 (   1.17%)

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251112122521.1331238-3-mgorman@techsingularity.net
2025-11-17 17:13:15 +01:00
Documentation tty/serial driver bugfixes for 6.18-rc3 2025-10-26 10:24:39 -07:00
LICENSES LICENSES: Replace the obsolete address of the FSF in the GFDL-1.2 2025-07-24 11:15:39 +02:00
arch Merge branch 'linus/master' into sched/core, to resolve conflict 2025-10-29 08:42:28 +01:00
block block-6.18-20251023 2025-10-24 12:48:19 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto This push contains the following changes: 2025-10-10 08:56:16 -07:00
drivers Char/Misc driver fixes for 6.18-rc3 2025-10-26 10:33:46 -07:00
fs - Remove dead code leftovers after a recent mitigations cleanup which fail 2025-10-26 09:57:18 -07:00
include sched/deadline: Fix dl_server stop condition 2025-11-11 12:33:39 +01:00
init printk changes for 6.18 2025-10-04 11:13:11 -07:00
io_uring io_uring: fix buffer auto-commit for multishot uring_cmd 2025-10-23 19:41:31 -06:00
ipc namespace-6.18-rc1 2025-09-29 11:20:29 -07:00
kernel sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals 2025-11-17 17:13:15 +01:00
lib lib/crypto: poly1305: Restore dependency of arch code on !KMSAN 2025-10-22 10:52:10 -07:00
mm slab fixes for 6.18-rc3 2025-10-24 12:40:51 -07:00
net Including fixes from can. Slim pickings, I'm guessing people haven't 2025-10-23 07:03:18 -10:00
rust Driver core fixes for 6.18-rc3 2025-10-25 11:03:46 -07:00
samples Char/Misc/IIO/Binder changes for 6.18-rc1 2025-10-04 16:26:32 -07:00
scripts Kbuild fixes for 6.18 #1 2025-10-11 15:47:12 -07:00
security integrity-v6.18 2025-10-05 10:48:33 -07:00
sound ALSA: hda/realtek: Fix mute led for HP Omen 17-cb0xxx 2025-10-17 16:37:21 +02:00
tools - Fix x32 build due to wrong format specifier on that sub-arch 2025-10-26 09:44:36 -07:00
usr gen_init_cpio: Ignore fsync() returning EINVAL on pipes 2025-10-07 09:53:05 -07:00
virt KVM x86 fixes for 6.18: 2025-10-18 10:25:43 +02:00
.clang-format memblock: drop for_each_free_mem_pfn_range_in_zone_from() 2025-09-14 08:49:03 +03:00
.clippy.toml rust: clean Rust 1.88.0's warning about `clippy::disallowed_macros` configuration 2025-05-07 00:11:47 +02:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: remove Alyssa Rosenzweig 2025-09-18 21:17:31 +02:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore .gitignore: ignore compile_commands.json globally 2025-08-12 15:53:55 -07:00
.mailmap MAINTAINERS: Update Alex Williamson's email address 2025-10-20 15:45:03 -06:00
.pylintrc tools: docs: parse-headers.py: move it from sphinx dir 2025-08-29 15:54:42 -06:00
.rustfmt.toml
COPYING
CREDITS USB/Thunderbolt changes for 6.18-rc1 2025-10-04 16:07:08 -07:00
Kbuild sched: Make migrate_{en,dis}able() inline 2025-09-25 09:57:16 +02:00
Kconfig io_uring: Rename KConfig to Kconfig 2025-02-19 14:53:27 -07:00
MAINTAINERS io_uring-6.18-20251023 2025-10-24 12:44:31 -07:00
Makefile Linux 6.18-rc3 2025-10-26 15:59:49 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.