mirror-linux/kernel/sched
Tejun Heo d245698d72 cgroup: Defer task cgroup unlink until after the task is done switching out
When a task exits, css_set_move_task(tsk, cset, NULL, false) unlinks the task
from its cgroup. From the cgroup's perspective, the task is now gone. If this
makes the cgroup empty, it can be removed, triggering ->css_offline() callbacks
that notify controllers the cgroup is going offline resource-wise.

However, the exiting task can still run, perform memory operations, and schedule
until the final context switch in finish_task_switch(). This creates a confusing
situation where controllers are told a cgroup is offline while resource
activities are still happening in it. While this hasn't broken existing
controllers, it has caused direct confusion for sched_ext schedulers.

Split cgroup_task_exit() into two functions. cgroup_task_exit() now only calls
the subsystem exit callbacks and continues to be called from do_exit(). The
css_set cleanup is moved to the new cgroup_task_dead() which is called from
finish_task_switch() after the final context switch, so that the cgroup only
appears empty after the task is truly done running.

This also reorders operations so that subsys->exit() is now called before
unlinking from the cgroup, which shouldn't break anything.

Cc: Dan Schatzberg <dschatzberg@meta.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-03 11:46:18 -10:00
..
Makefile tracing: Disable branch profiling in noinstr code 2025-03-22 09:49:26 +01:00
autogroup.c cgroup: Rename cgroup lifecycle hooks to cgroup_task_*() 2025-11-03 11:46:18 -10:00
autogroup.h sched: Clean up and standardize #if/#else/#endif markers in sched/autogroup.[ch] 2025-06-13 08:47:14 +02:00
build_policy.c sched_ext: Move internal type and accessor definitions to ext_internal.h 2025-09-03 11:33:28 -10:00
build_utility.c sched/smp: Make SMP unconditional 2025-06-13 08:47:18 +02:00
clock.c sched: Clean up and standardize #if/#else/#endif markers in sched/clock.c 2025-06-13 08:47:14 +02:00
completion.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
core.c cgroup: Defer task cgroup unlink until after the task is done switching out 2025-11-03 11:46:18 -10:00
core_sched.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpuacct.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpudeadline.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpudeadline.h sched/smp: Make SMP unconditional 2025-06-13 08:47:18 +02:00
cpufreq.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpufreq_schedutil.c sched: Clean up and standardize #if/#else/#endif markers in sched/cpufreq_schedutil.c 2025-06-13 08:47:15 +02:00
cpupri.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpupri.h sched/smp: Make SMP unconditional 2025-06-13 08:47:18 +02:00
cputime.c sched: Clean up and standardize #if/#else/#endif markers in sched/cputime.c 2025-06-13 08:47:15 +02:00
deadline.c sched/deadline: Stop dl_server before CPU goes offline 2025-10-14 13:43:08 +02:00
debug.c sched/deadline: Always stop dl-server before changing parameters 2025-08-26 10:46:00 +02:00
ext.c Revert "sched_ext: Use rhashtable_lookup() instead of rhashtable_lookup_fast()" 2025-09-23 20:38:23 -10:00
ext.h sched_ext: Use cgroup_lock/unlock() to synchronize against cgroup operations 2025-09-03 11:36:07 -10:00
ext_idle.c sched_ext: Merge branch 'for-6.17-fixes' into for-6.18 2025-09-23 09:10:20 -10:00
ext_idle.h sched_ext: Always use SMP versions in kernel/sched/ext_idle.h 2025-06-13 14:47:59 -10:00
ext_internal.h sched_ext: Add SCX_EFLAG_INITIALIZED to indicate successful ops.init() 2025-09-23 09:03:26 -10:00
fair.c sched/fair: Fix pelt lost idle time detection 2025-10-14 13:43:08 +02:00
features.h sched/fair: Untangle NEXT_BUDDY and pick_next_task() 2024-12-09 11:48:13 +01:00
idle.c sched/smp: Use the SMP version of the idle scheduling class 2025-06-13 08:47:21 +02:00
isolation.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
loadavg.c Merge branch 'tip/sched/urgent' 2025-07-14 17:16:28 +02:00
membarrier.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
pelt.c sched: Clean up and standardize #if/#else/#endif markers in sched/pelt.[ch] 2025-06-13 08:47:17 +02:00
pelt.h sched/fair: Switch to task based throttle model 2025-09-03 10:03:14 +02:00
psi.c sched/psi: Fix psi_seq initialization 2025-08-04 10:51:22 -07:00
rq-offsets.c sched: Make migrate_{en,dis}able() inline 2025-09-25 09:57:16 +02:00
rt.c sched: Fix proxy/current (push,pull)ability 2025-07-14 17:16:33 +02:00
sched-pelt.h sched: Make clangd usable 2025-06-11 11:20:53 +02:00
sched.h Scheduler updates for v6.18: 2025-09-30 10:35:11 -07:00
smp.h sched: Make clangd usable 2025-06-11 11:20:53 +02:00
stats.c sched/smp: Use the SMP version of schedstats 2025-06-13 08:47:21 +02:00
stats.h sched: Clean up and standardize #if/#else/#endif markers in sched/stats.[ch] 2025-06-13 08:47:17 +02:00
stop_task.c sched/smp: Use the SMP version of the stop-CPU scheduling class 2025-06-13 08:47:21 +02:00
swait.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
syscalls.c sched/smp: Use the SMP version of the scheduler syscalls 2025-06-13 08:47:21 +02:00
topology.c Scheduler updates for v6.18: 2025-09-30 10:35:11 -07:00
wait.c ARM: 2025-07-30 17:14:01 -07:00
wait_bit.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00