mirror-linux/include/linux/sched
Tejun Heo 7900aa699c sched_ext: Fix cgroup exit ordering by moving sched_ext_free() to finish_task_switch()
sched_ext_free() was called from __put_task_struct() when the last reference
to the task is dropped, which could be long after the task has finished
running. This causes cgroup-related problems:

- ops.init_task() can be called on a cgroup which didn't get ops.cgroup_init()'d
  during scheduler load, because the cgroup might be destroyed/unlinked
  while the zombie or dead task is still lingering on the scx_tasks list.

- ops.cgroup_exit() could be called before ops.exit_task() is called on all
  member tasks, leading to incorrect exit ordering.

Fix by moving it to finish_task_switch() to be called right after the final
context switch away from the dying task, matching when sched_class->task_dead()
is called. Rename it to sched_ext_dead() to match the new calling context.

By calling sched_ext_dead() before cgroup_task_dead(), we ensure that:

- Tasks visible on scx_tasks list have valid cgroups during scheduler load,
  as cgroup_mutex prevents cgroup destruction while the task is still linked.

- All member tasks have ops.exit_task() called and are removed from scx_tasks
  before the cgroup can be destroyed and trigger ops.cgroup_exit().

This fix is made possible by the cgroup_task_dead() split in the previous patch.

This also makes more sense resource-wise as there's no point in keeping
scheduler side resources around for dead tasks.

Reported-by: Dan Schatzberg <dschatzberg@meta.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-03 11:57:30 -10:00
..
affinity.h
autogroup.h
clock.h
cond_resched.h
coredump.h mm: update coredump logic to correctly use bitmap mm flags 2025-09-13 16:54:57 -07:00
cpufreq.h
cputime.h
deadline.h
debug.h
ext.h sched_ext: Fix cgroup exit ordering by moving sched_ext_free() to finish_task_switch() 2025-11-03 11:57:30 -10:00
hotplug.h
idle.h
init.h
isolation.h
jobctl.h
loadavg.h
mm.h mm: constify arch_pick_mmap_layout() for improved const-correctness 2025-09-21 14:22:14 -07:00
nohz.h
numa_balancing.h
posix-timers.h
prio.h
rseq_api.h
rt.h
sd_flags.h
signal.h cgroup: replace global percpu_rwsem with per threadgroup resem when writing to cgroup.procs 2025-09-10 07:44:51 -10:00
smt.h
stat.h
sysctl.h
task.h Patch series in this pull request: 2025-10-02 18:44:54 -07:00
task_flags.h
task_stack.h
thread_info_api.h
topology.h sched: Move STDL_INIT() functions out-of-line 2025-09-03 10:03:13 +02:00
types.h
user.h
vhost_task.h
wake_q.h
xacct.h