platform_system_core

History

T.J. Mercier 3b5bb3a364 libprocessgroup: Poll on cgroup.events In killProcessGroup we currently read cgroup.procs to find processes to kill, send them kill signals until cgroup.procs is empty, then remove the cgroup directory. The cgroup cannot be removed until all processes are dead, otherwise we'll get an EBUSY error from the kernel. There is a race in the kernel where cgroup.procs can read empty even though the cgroup is pinned by processes which are still exiting, and can't be removed yet. [1] Let's use the populated field of cgroup.events instead of an empty cgroup.procs file to determine when the cgroup is removable. In addition to functioning like we expect, this is more efficient because we can poll on cgroup.events instead of retrying kills and rereading cgroup.procs every 5ms which should help reduce CPU contention and cgroup lock contention. It's still possible that it takes longer for a cgroup to become unpopulated than our timeout allows, in which case we will fail to remove the cgroup and leak kernel memory. But this change should help reduce the probability of that happening. [1] https://lore.kernel.org/all/CABdmKX3SOXpcK85a7cx3iXrwUj=i1yXqEz9i9zNkx8mB=ZXQ8A@mail.gmail.com/ Bug: 301871933 Change-Id: If7dcfb331f47e06994c9ac85ed08bbcce18cdad7	2023-12-07 00:12:00 +00:00
..
processgroup	libprocessgroup: Poll on cgroup.events	2023-12-07 00:12:00 +00:00

T.J. Mercier 3b5bb3a364 libprocessgroup: Poll on cgroup.events

In killProcessGroup we currently read cgroup.procs to find processes to
kill, send them kill signals until cgroup.procs is empty, then remove
the cgroup directory. The cgroup cannot be removed until all processes
are dead, otherwise we'll get an EBUSY error from the kernel.

There is a race in the kernel where cgroup.procs can read empty even
though the cgroup is pinned by processes which are still exiting, and
can't be removed yet. [1]

Let's use the populated field of cgroup.events instead of an empty
cgroup.procs file to determine when the cgroup is removable. In
addition to functioning like we expect, this is more efficient because
we can poll on cgroup.events instead of retrying kills and rereading
cgroup.procs every 5ms which should help reduce CPU contention and
cgroup lock contention.

It's still possible that it takes longer for a cgroup to become
unpopulated than our timeout allows, in which case we will fail to
remove the cgroup and leak kernel memory. But this change should help
reduce the probability of that happening.

[1] https://lore.kernel.org/all/CABdmKX3SOXpcK85a7cx3iXrwUj=i1yXqEz9i9zNkx8mB=ZXQ8A@mail.gmail.com/

Bug: 301871933
Change-Id: If7dcfb331f47e06994c9ac85ed08bbcce18cdad7

2023-12-07 00:12:00 +00:00

processgroup

libprocessgroup: Poll on cgroup.events

2023-12-07 00:12:00 +00:00