In killProcessGroup we currently read cgroup.procs to find processes to
kill, send them kill signals until cgroup.procs is empty, then remove
the cgroup directory. The cgroup cannot be removed until all processes
are dead, otherwise we'll get an EBUSY error from the kernel.
There is a race in the kernel where cgroup.procs can read empty even
though the cgroup is pinned by processes which are still exiting, and
can't be removed yet. [1]
Let's use the populated field of cgroup.events instead of an empty
cgroup.procs file to determine when the cgroup is removable. In
addition to functioning like we expect, this is more efficient because
we can poll on cgroup.events instead of retrying kills and rereading
cgroup.procs every 5ms which should help reduce CPU contention and
cgroup lock contention.
It's still possible that it takes longer for a cgroup to become
unpopulated than our timeout allows, in which case we will fail to
remove the cgroup and leak kernel memory. But this change should help
reduce the probability of that happening.
[1] https://lore.kernel.org/all/CABdmKX3SOXpcK85a7cx3iXrwUj=i1yXqEz9i9zNkx8mB=ZXQ8A@mail.gmail.com/
Bug: 301871933
Change-Id: If7dcfb331f47e06994c9ac85ed08bbcce18cdad7
Current cow size estimation is computed by taking the offset of next
data write. However, during COW size estimation we repeatdly increment
op_count_max. Since data section is placed right after op section,
incrementing op_count_max has the effect of moving the beginning of
data section. next_data_pos_ is never updated in this process, causing
the final estiamte to be off.
Test: th
Bug: 313962438
Change-Id: I250dff54c470c9c20d6db33d91bac898358dee31
Since currently implementation just does an bitwise or with the
storage unit, multiple calls to set_source would result in overlapping
bits. Fix by first clear the existing storage.
Test: th
Bug: 313962438
Change-Id: Iecfe8dd244c0f65ecd3cacb0404fdc39ef836d97
This is done so that we could depend on it elsewhere without needing all the unrelated methods.
Needed for ag/24553347
Bug: 296207744
Test: refactoring build
Change-Id: I7c6733208f3ae63ba9559753a24cffcb8e1b9d1e
This is a no-op but will be used in upcoming scudo changes that allow to
change the depot size at process startup time, and as such we will no
longer be able to call __scudo_get_stack_depot_size in debuggerd.
We already did the equivalent change for the ring buffer size in
https://r.android.com/q/topic:%22scudo_ring_buffer_size%22
Bug: 309446692
Change-Id: I761a7602c54a1f8f2d0575c5e011820d8dbaab63
Allow O_DIRECT reads on source block device.
This will further cut down the Active and Inactive file pages
during partition verification.
On Pixel 6 after incremental OTA - Post OTA reboot:
Without patch With patch Delta
--------------------------------------------------------
Inactive(File): 4992MB 3887MB ~22%
Active(File): 1465MB 1014MB ~30%
Boot time however increases from 25 to 30 seconds.
This is not yet enabled. This will be behind a sysprop flag
or for low memory devices and will be enabled later.
Additionally, set the priority of worker threads to normal.
Merge threads priority is reduced. This will help low memory
devices as tested on Pixel watch.
Bug: 311233916
Test: OTA on Pixel 6
Change-Id: Icacdef08d68e28d3062611477703e7cf393a9f10
Signed-off-by: Akilesh Kailash <akailash@google.com>
Scanning of partitions post OTA leads to memory pressure. Tune
the read-ahead of source and COW block device. This is currently
set to 32KB.
This reduces Inactive(file) and Active(file) usage during entire
duration of boot post OTA.
On Pixel 6: For incremental OTA ~400M. During boot:
Without-patch With-patch Delta
--------------------------------------------
1: Peak Inactive(file): 4469MB 3118MB ~30%
2: Peak Active(file): 985MB 712MB ~27%
No regression observed on boot time.
Additionally, cut down the number of threads to verify the partitions.
Bug: 311233916
Test: Incremental OTA on Pixel 6
Change-Id: I0b842776c36fa089c39c170fa7bf0f246e16636d
Signed-off-by: Akilesh Kailash <akailash@google.com>
This does not change the default cow version. Currently v2 is still the
default. This CL only enables OEMs to set virtual_ab_cow_version to 3
for testing purposes.
Test: th
Bug: 313962438
Change-Id: I7a328fa32283560a48604ffe02edd2551ac49a83
We can shove type into source info to save 8 bits in per cow operation.
We only need 4 bits inside of source_info to enumerate all the types of
Cow Operation. Since CowOperationV3 is not used on disk(just yet) , we
can make format changes.
This CL is mechanical:
1. Remove tye .type field from CowOperation struct
2. Add a type() getter method to CowOperation struct
3. Replace all existing usage of `type` member with the new getter
No functional changes, just refactorings.
Test: th
Bug: 304602386
Bug: 313962438
Change-Id: I85d89c71fc6afede12ea299a4a3e3b2184ea2d8b
Provide an additional way to request the block layer to prioritize
foreground I/O over background I/O. This patch prepares for tests with
the mq-deadline I/O scheduler. While CFQ and BFQ support a "weight"
cgroup attribute, the mq-deadline scheduler does not. The mq-deadline
I/O scheduler only supports the I/O priority class for differentiating
I/O priorities.
The io.prio.class attribute is declared optional since this attribute
only exists if CONFIG_BLK_CGROUP_IOPRIO != n in the kernel
configuration.
Bug: 186902601
Change-Id: Iee1004cd0996e32245aff2b51772ef40178e024f
Signed-off-by: Bart Van Assche <bvanassche@google.com>