mirror-linux/kernel/sched
Menglong Dong 378b770819 sched: Make migrate_{en,dis}able() inline
For now, migrate_enable and migrate_disable are global, which makes them
become hotspots in some case. Take BPF for example, the function calling
to migrate_enable and migrate_disable in BPF trampoline can introduce
significant overhead, and following is the 'perf top' of FENTRY's
benchmark (./tools/testing/selftests/bpf/bench trig-fentry):

  54.63% bpf_prog_2dcccf652aac1793_bench_trigger_fentry [k]
                 bpf_prog_2dcccf652aac1793_bench_trigger_fentry
  10.43% [kernel] [k] migrate_enable
  10.07% bpf_trampoline_6442517037 [k] bpf_trampoline_6442517037
  8.06% [kernel] [k] __bpf_prog_exit_recur
  4.11% libc.so.6 [.] syscall
  2.15% [kernel] [k] entry_SYSCALL_64
  1.48% [kernel] [k] memchr_inv
  1.32% [kernel] [k] fput
  1.16% [kernel] [k] _copy_to_user
  0.73% [kernel] [k] bpf_prog_test_run_raw_tp

So in this commit, we make migrate_enable/migrate_disable inline to obtain
better performance. The struct rq is defined internally in
kernel/sched/sched.h, and the field "nr_pinned" is accessed in
migrate_enable/migrate_disable, which makes it hard to make them inline.

Alexei Starovoitov suggests to generate the offset of "nr_pinned" in [1],
so we can define the migrate_enable/migrate_disable in
include/linux/sched.h and access "this_rq()->nr_pinned" with
"(void *)this_rq() + RQ_nr_pinned".

The offset of "nr_pinned" is generated in include/generated/rq-offsets.h
by kernel/sched/rq-offsets.c.

Generally speaking, we move the definition of migrate_enable and
migrate_disable to include/linux/sched.h from kernel/sched/core.c. The
calling to __set_cpus_allowed_ptr() is leaved in ___migrate_enable().

The "struct rq" is not available in include/linux/sched.h, so we can't
access the "runqueues" with this_cpu_ptr(), as the compilation will fail
in this_cpu_ptr() -> raw_cpu_ptr() -> __verify_pcpu_ptr():
  typeof((ptr) + 0)

So we introduce the this_rq_raw() and access the runqueues with
arch_raw_cpu_ptr/PERCPU_PTR directly.

The variable "runqueues" is not visible in the kernel modules, and export
it is not a good idea. As Peter Zijlstra advised in [2], we define and
export migrate_enable/migrate_disable in kernel/sched/core.c too, and use
them for the modules.

Before this patch, the performance of BPF FENTRY is:

  fentry         :  113.030 ± 0.149M/s
  fentry         :  112.501 ± 0.187M/s
  fentry         :  112.828 ± 0.267M/s
  fentry         :  115.287 ± 0.241M/s

After this patch, the performance of BPF FENTRY increases to:

  fentry         :  143.644 ± 0.670M/s
  fentry         :  149.764 ± 0.362M/s
  fentry         :  149.642 ± 0.156M/s
  fentry         :  145.263 ± 0.221M/s

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/CAADnVQ+5sEDKHdsJY5ZsfGDO_1SEhhQWHrt2SMBG5SYyQ+jt7w@mail.gmail.com/ [1]
Link: https://lore.kernel.org/all/20250819123214.GH4067720@noisy.programming.kicks-ass.net/ [2]
2025-09-25 09:57:16 +02:00
..
Makefile tracing: Disable branch profiling in noinstr code 2025-03-22 09:49:26 +01:00
autogroup.c sched: Clean up and standardize #if/#else/#endif markers in sched/autogroup.[ch] 2025-06-13 08:47:14 +02:00
autogroup.h sched: Clean up and standardize #if/#else/#endif markers in sched/autogroup.[ch] 2025-06-13 08:47:14 +02:00
build_policy.c sched/smp: Make SMP unconditional 2025-06-13 08:47:18 +02:00
build_utility.c sched/smp: Make SMP unconditional 2025-06-13 08:47:18 +02:00
clock.c sched: Clean up and standardize #if/#else/#endif markers in sched/clock.c 2025-06-13 08:47:14 +02:00
completion.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
core.c sched: Make migrate_{en,dis}able() inline 2025-09-25 09:57:16 +02:00
core_sched.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpuacct.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpudeadline.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpudeadline.h sched/smp: Make SMP unconditional 2025-06-13 08:47:18 +02:00
cpufreq.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpufreq_schedutil.c sched: Clean up and standardize #if/#else/#endif markers in sched/cpufreq_schedutil.c 2025-06-13 08:47:15 +02:00
cpupri.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpupri.h sched/smp: Make SMP unconditional 2025-06-13 08:47:18 +02:00
cputime.c sched: Clean up and standardize #if/#else/#endif markers in sched/cputime.c 2025-06-13 08:47:15 +02:00
deadline.c sched/deadline: Fix race in push_dl_task() 2025-09-03 10:03:12 +02:00
debug.c sched/deadline: Always stop dl-server before changing parameters 2025-08-26 10:46:00 +02:00
ext.c sched/ext: Fix invalid task state transitions on class switch 2025-08-11 06:56:37 -10:00
ext.h sched_ext: Add support for cgroup bandwidth control interface 2025-06-20 17:03:51 -10:00
ext_idle.c sched_ext: Changes for v6.17 2025-07-31 16:29:46 -07:00
ext_idle.h sched_ext: Always use SMP versions in kernel/sched/ext_idle.h 2025-06-13 14:47:59 -10:00
fair.c sched/fair: Do not balance task to a throttled cfs_rq 2025-09-15 09:38:38 +02:00
features.h sched/fair: Untangle NEXT_BUDDY and pick_next_task() 2024-12-09 11:48:13 +01:00
idle.c sched/smp: Use the SMP version of the idle scheduling class 2025-06-13 08:47:21 +02:00
isolation.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
loadavg.c Merge branch 'tip/sched/urgent' 2025-07-14 17:16:28 +02:00
membarrier.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
pelt.c sched: Clean up and standardize #if/#else/#endif markers in sched/pelt.[ch] 2025-06-13 08:47:17 +02:00
pelt.h sched/fair: Switch to task based throttle model 2025-09-03 10:03:14 +02:00
psi.c sched/psi: Fix psi_seq initialization 2025-08-04 10:51:22 -07:00
rq-offsets.c sched: Make migrate_{en,dis}able() inline 2025-09-25 09:57:16 +02:00
rt.c sched: Fix proxy/current (push,pull)ability 2025-07-14 17:16:33 +02:00
sched-pelt.h sched: Make clangd usable 2025-06-11 11:20:53 +02:00
sched.h sched/fair: Task based throttle time accounting 2025-09-03 10:03:14 +02:00
smp.h sched: Make clangd usable 2025-06-11 11:20:53 +02:00
stats.c sched/smp: Use the SMP version of schedstats 2025-06-13 08:47:21 +02:00
stats.h sched: Clean up and standardize #if/#else/#endif markers in sched/stats.[ch] 2025-06-13 08:47:17 +02:00
stop_task.c sched/smp: Use the SMP version of the stop-CPU scheduling class 2025-06-13 08:47:21 +02:00
swait.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
syscalls.c sched/smp: Use the SMP version of the scheduler syscalls 2025-06-13 08:47:21 +02:00
topology.c sched: Move STDL_INIT() functions out-of-line 2025-09-03 10:03:13 +02:00
wait.c ARM: 2025-07-30 17:14:01 -07:00
wait_bit.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00