In wait_and_unmount(), kill the processes with open files after umount()
has been failing for 2 seconds rather than 17 seconds. This avoids a
long boot delay on devices that use FDE.
Detailed explanation:
On FDE devices, vold needs to unmount the tmpfs /data in order to mount
the real, decrypted /data. On first boot, it also needs to unmount the
unencrypted /data in order to encrypt it in-place.
/data can't be unmounted if files are open inside it. In theory, init
is responsible for killing all processes with open files in /data, via
the property trigger "vold.decrypt=trigger_shutdown_framework".
However, years ago, commit 6e8440fd50 ("cryptfs: kill processes with
open files on tmpfs /data") added a fallback where vold kills the
processes itself. Since then, in practice people have increasingly been
relying on this fallback, as services keep being added that use /data
but don't get stopped by trigger_shutdown_framework.
This is slowing down boot, as vold sleeps for 17 seconds before it
actually kills the processes.
The problematic services include services that are now started
explicitly in the post-fs-data trigger rather than implicitly as part of
a class (e.g., tombstoned), as well as services that now need to be
started as part of one of the early-boot classes like core or early_hal
but can still open files in /data later (e.g. keystore2 and credstore).
Another complication is that on default-encrypted devices (devices with
no PIN/pattern/password), trigger_shutdown_framework isn't run at all,
but rather it's expected that the relevant services simply weren't
started yet. This means that we can't fix the problem just by fixing
trigger_shutdown_framework to kill all the needed processes.
Therefore, given that the vold fallback is being relied on in practice,
and FDE won't be supported much longer anyway (so simple fixes are very
much preferable here), let's just change wait_and_unmount() in vold to
use more appropriate timeouts. Instead of waiting for 17 seconds before
killing processes, just wait for 2 seconds. Keep the total timeout of
20 seconds, but spend most of it retrying killing the processes, and
only if the unmount is still failing.
This avoids the long boot delays in practice.
Bug: 187231646
Bug: 186165644
Test: Tested FDE on Cuttlefish, and checked logcat to verify that the
boot delay is gone.
Change-Id: Id06a9615a87988c8336396c49ee914b35f8d585b