mirror-linux

Commit Graph

Author	SHA1	Message	Date
Felix Moessbauer	ed4fb6d7ef	hrtimer: Use and report correct timerslack values for realtime tasks The timerslack_ns setting is used to specify how much the hardware timers should be delayed, to potentially dispatch multiple timers in a single interrupt. This is a performance optimization. Timers of realtime tasks (having a realtime scheduling policy) should not be delayed. This logic was inconsitently applied to the hrtimers, leading to delays of realtime tasks which used timed waits for events (e.g. condition variables). Due to the downstream override of the slack for rt tasks, the procfs reported incorrect (non-zero) timerslack_ns values. This is changed by setting the timer_slack_ns task attribute to 0 for all tasks with a rt policy. By that, downstream users do not need to specially handle rt tasks (w.r.t. the slack), and the procfs entry shows the correct value of "0". Setting non-zero slack values (either via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored, as stated in "man 2 PR_SET_TIMERSLACK": Timer slack is not applied to threads that are scheduled under a real-time scheduling policy (see sched_setscheduler(2)). The special handling of timerslack on rt tasks in downstream users is removed as well. Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240814121032.368444-2-felix.moessbauer@siemens.com	2024-08-23 20:13:02 +02:00
Ingo Molnar	402de7fc88	sched: Fix spelling in comments Do a spell-checking pass. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2024-05-27 17:00:21 +02:00
Ingo Molnar	04746ed80b	sched/syscalls: Split out kernel/sched/syscalls.c from kernel/sched/core.c core.c has become rather large, move most scheduler syscall related functionality into a separate file, syscalls.c. This is about ~15% of core.c's raw linecount. Move the alloc_user_cpus_ptr(), __rt_effective_prio(), rt_effective_prio(), uclamp_none(), uclamp_se_set() and uclamp_bucket_id() inlines to kernel/sched/sched.h. Internally export the __sched_setscheduler(), __sched_setaffinity(), __setscheduler_prio(), set_load_weight(), enqueue_task(), dequeue_task(), check_class_changed(), splice_balance_callbacks() and balance_callbacks() methods to better facilitate this. Move the new file's build to sched_policy.c, because it fits there semantically, but also because it's the smallest of the 4 build units under an allmodconfig build: -rw-rw-r-- 1 mingo mingo 7.3M May 27 12:35 kernel/sched/core.i -rw-rw-r-- 1 mingo mingo 6.4M May 27 12:36 kernel/sched/build_utility.i -rw-rw-r-- 1 mingo mingo 6.3M May 27 12:36 kernel/sched/fair.i -rw-rw-r-- 1 mingo mingo 5.8M May 27 12:36 kernel/sched/build_policy.i This better balances build time for scheduler subsystem rebuilds. I build-tested this new file as a standalone syscalls.o file for a bit, to make sure all the encapsulations & abstractions are robust. Also update/add my copyright notices to these files. Build time measurements: # -Before/+After: kepler:~/tip> perf stat -e 'cycles,instructions,duration_time' --sync --repeat 5 --pre 'rm -f kernel/sched/*.o' m kernel/sched/built-in.a >/dev/null Performance counter stats for 'm kernel/sched/built-in.a' (5 runs): - 71,938,508,607 cycles ( +- 0.17% ) + 71,992,916,493 cycles ( +- 0.22% ) - 106,214,780,964 instructions # 1.48 insn per cycle ( +- 0.01% ) + 105,450,231,154 instructions # 1.46 insn per cycle ( +- 0.01% ) - 5,878,232,620 ns duration_time ( +- 0.38% ) + 5,290,085,069 ns duration_time ( +- 0.21% ) - 5.8782 +- 0.0221 seconds time elapsed ( +- 0.38% ) + 5.2901 +- 0.0111 seconds time elapsed ( +- 0.21% ) Build time improvement of -11.1% (duration_time) is expected: the parallel build time of the scheduler subsystem is determined by the largest, slowest to build object file, which is kernel/sched/core.o. By moving ~15% of its complexity into another build unit, we reduced build time by -11%. Measured cycles spent on building is within its ~0.2% stddev noise envelope. The -0.7% reduction in instructions spent on building the scheduler is statistically reliable and somewhat surprising - I can only speculate: maybe compilers aren't that efficient at building & optimizing 10+ KLOC files (core.c), and it's an overall win to balance the linecount a bit. Anyway, this might be a data point that suggests that reducing the linecount of our largest files will improve not just code readability and maintainability, but might also improve build times a bit. Code generation got a bit worse, by 0.5kb text on an x86 defconfig build: # -Before/+After: kepler:~/tip> size vmlinux text data bss dec hex filename -26475475 10439178 1740804 38655457 24dd5e1 vmlinux +26476003 10439178 1740804 38655985 24dd7f1 vmlinux kepler:~/tip> size kernel/sched/built-in.a text data bss dec hex filename - 76056 30025 489 106570 1a04a kernel/sched/core.o (ex kernel/sched/built-in.a) + 63452 29453 489 93394 16cd2 kernel/sched/core.o (ex kernel/sched/built-in.a) 44299 2181 104 46584 b5f8 kernel/sched/fair.o (ex kernel/sched/built-in.a) - 42764 3424 120 46308 b4e4 kernel/sched/build_policy.o (ex kernel/sched/built-in.a) + 55651 4044 120 59815 e9a7 kernel/sched/build_policy.o (ex kernel/sched/built-in.a) 44866 12655 2192 59713 e941 kernel/sched/build_utility.o (ex kernel/sched/built-in.a) 44866 12655 2192 59713 e941 kernel/sched/build_utility.o (ex kernel/sched/built-in.a) This is primarily due to the extra functions exported, and the size gets exaggerated somewhat by __pfx CFI function padding: ffffffff810cc710 <__pfx_enqueue_task>: ffffffff810cc710: 90 nop ffffffff810cc711: 90 nop ffffffff810cc712: 90 nop ffffffff810cc713: 90 nop ffffffff810cc714: 90 nop ffffffff810cc715: 90 nop ffffffff810cc716: 90 nop ffffffff810cc717: 90 nop ffffffff810cc718: 90 nop ffffffff810cc719: 90 nop ffffffff810cc71a: 90 nop ffffffff810cc71b: 90 nop ffffffff810cc71c: 90 nop ffffffff810cc71d: 90 nop ffffffff810cc71e: 90 nop ffffffff810cc71f: 90 nop AFAICS the cost is primarily not to core.o and fair.o though (which contain most performance sensitive scheduler functions), only to syscalls.o that get called with much lower frequency - so I think this is an acceptable trade-off for better code separation. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20240407084319.1462211-2-mingo@kernel.org	2024-05-27 13:56:10 +02:00

Author

SHA1

Message

Date

Felix Moessbauer

ed4fb6d7ef

hrtimer: Use and report correct timerslack values for realtime tasks

The timerslack_ns setting is used to specify how much the hardware
timers should be delayed, to potentially dispatch multiple timers in a
single interrupt. This is a performance optimization. Timers of
realtime tasks (having a realtime scheduling policy) should not be
delayed.

This logic was inconsitently applied to the hrtimers, leading to delays
of realtime tasks which used timed waits for events (e.g. condition
variables). Due to the downstream override of the slack for rt tasks,
the procfs reported incorrect (non-zero) timerslack_ns values.

This is changed by setting the timer_slack_ns task attribute to 0 for
all tasks with a rt policy. By that, downstream users do not need to
specially handle rt tasks (w.r.t. the slack), and the procfs entry
shows the correct value of "0". Setting non-zero slack values (either
via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored,
as stated in "man 2 PR_SET_TIMERSLACK":

  Timer slack is not applied to threads that are scheduled under a
  real-time scheduling policy (see sched_setscheduler(2)).

The special handling of timerslack on rt tasks in downstream users
is removed as well.

Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240814121032.368444-2-felix.moessbauer@siemens.com

2024-08-23 20:13:02 +02:00

Ingo Molnar

402de7fc88

sched: Fix spelling in comments

Do a spell-checking pass.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

2024-05-27 17:00:21 +02:00

Ingo Molnar

04746ed80b

sched/syscalls: Split out kernel/sched/syscalls.c from kernel/sched/core.c

core.c has become rather large, move most scheduler syscall
related functionality into a separate file, syscalls.c.

This is about ~15% of core.c's raw linecount.

Move the alloc_user_cpus_ptr(), __rt_effective_prio(),
rt_effective_prio(), uclamp_none(), uclamp_se_set()
and uclamp_bucket_id() inlines to kernel/sched/sched.h.

Internally export the __sched_setscheduler(), __sched_setaffinity(),
__setscheduler_prio(), set_load_weight(), enqueue_task(), dequeue_task(),
check_class_changed(), splice_balance_callbacks() and balance_callbacks()
methods to better facilitate this.

Move the new file's build to sched_policy.c, because it fits there
semantically, but also because it's the smallest of the 4 build units
under an allmodconfig build:

  -rw-rw-r-- 1 mingo mingo 7.3M May 27 12:35 kernel/sched/core.i
  -rw-rw-r-- 1 mingo mingo 6.4M May 27 12:36 kernel/sched/build_utility.i
  -rw-rw-r-- 1 mingo mingo 6.3M May 27 12:36 kernel/sched/fair.i
  -rw-rw-r-- 1 mingo mingo 5.8M May 27 12:36 kernel/sched/build_policy.i

This better balances build time for scheduler subsystem rebuilds.

I build-tested this new file as a standalone syscalls.o file for a bit,
to make sure all the encapsulations & abstractions are robust.

Also update/add my copyright notices to these files.

Build time measurements:

 # -Before/+After:

 kepler:~/tip> perf stat -e 'cycles,instructions,duration_time' --sync --repeat 5 --pre 'rm -f kernel/sched/*.o' m kernel/sched/built-in.a >/dev/null

 Performance counter stats for 'm kernel/sched/built-in.a' (5 runs):

 -    71,938,508,607      cycles                                                                  ( +-  0.17% )
 +    71,992,916,493      cycles                                                                  ( +-  0.22% )
 -   106,214,780,964      instructions                     #    1.48  insn per cycle              ( +-  0.01% )
 +   105,450,231,154      instructions                     #    1.46  insn per cycle              ( +-  0.01% )
 -     5,878,232,620 ns   duration_time                                                           ( +-  0.38% )
 +     5,290,085,069 ns   duration_time                                                           ( +-  0.21% )

 -            5.8782 +- 0.0221 seconds time elapsed  ( +-  0.38% )
 +            5.2901 +- 0.0111 seconds time elapsed  ( +-  0.21% )

Build time improvement of -11.1% (duration_time) is expected: the
parallel build time of the scheduler subsystem is determined by the
largest, slowest to build object file, which is kernel/sched/core.o.
By moving ~15% of its complexity into another build unit, we reduced
build time by -11%.

Measured cycles spent on building is within its ~0.2% stddev noise envelope.

The -0.7% reduction in instructions spent on building the scheduler is
statistically reliable and somewhat surprising - I can only speculate:
maybe compilers aren't that efficient at building & optimizing 10+ KLOC files
(core.c), and it's an overall win to balance the linecount a bit.

Anyway, this might be a data point that suggests that reducing the linecount
of our largest files will improve not just code readability and maintainability,
but might also improve build times a bit.

Code generation got a bit worse, by 0.5kb text on an x86 defconfig build:

  # -Before/+After:

  kepler:~/tip> size vmlinux
     text	   data	    bss	    dec	    hex	filename
  -26475475	10439178	1740804	38655457	24dd5e1	vmlinux
  +26476003	10439178	1740804	38655985	24dd7f1	vmlinux

  kepler:~/tip> size kernel/sched/built-in.a
     text	   data	    bss	    dec	    hex	filename
  - 76056	  30025	    489	 106570	  1a04a	kernel/sched/core.o (ex kernel/sched/built-in.a)
  + 63452	  29453	    489	  93394	  16cd2	kernel/sched/core.o (ex kernel/sched/built-in.a)
    44299	   2181	    104	  46584	   b5f8	kernel/sched/fair.o (ex kernel/sched/built-in.a)
  - 42764	   3424	    120	  46308	   b4e4	kernel/sched/build_policy.o (ex kernel/sched/built-in.a)
  + 55651	   4044	    120	  59815	   e9a7	kernel/sched/build_policy.o (ex kernel/sched/built-in.a)
    44866	  12655	   2192	  59713	   e941	kernel/sched/build_utility.o (ex kernel/sched/built-in.a)
    44866	  12655	   2192	  59713	   e941	kernel/sched/build_utility.o (ex kernel/sched/built-in.a)

This is primarily due to the extra functions exported, and the size
gets exaggerated somewhat by __pfx CFI function padding:

	ffffffff810cc710 <__pfx_enqueue_task>:
	ffffffff810cc710:	90                   	nop
	ffffffff810cc711:	90                   	nop
	ffffffff810cc712:	90                   	nop
	ffffffff810cc713:	90                   	nop
	ffffffff810cc714:	90                   	nop
	ffffffff810cc715:	90                   	nop
	ffffffff810cc716:	90                   	nop
	ffffffff810cc717:	90                   	nop
	ffffffff810cc718:	90                   	nop
	ffffffff810cc719:	90                   	nop
	ffffffff810cc71a:	90                   	nop
	ffffffff810cc71b:	90                   	nop
	ffffffff810cc71c:	90                   	nop
	ffffffff810cc71d:	90                   	nop
	ffffffff810cc71e:	90                   	nop
	ffffffff810cc71f:	90                   	nop

AFAICS the cost is primarily not to core.o and fair.o though (which contain
most performance sensitive scheduler functions), only to syscalls.o
that get called with much lower frequency - so I think this is an acceptable
trade-off for better code separation.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Link: https://lore.kernel.org/r/20240407084319.1462211-2-mingo@kernel.org

2024-05-27 13:56:10 +02:00

3 Commits (c56f9ecb7fb6a3a90079c19eb4c8daf3bbf514b3)