Take action if we are running out of checkpoint space.
Configurable via ro.sys properties.
ro.sys.cp_usleeptime = Time to sleep between checks
ro.sys.cp_min_free_bytes = Min free space to act on
ro.sys.cp_commit_on_full = action to take. Either commits or reboots to
continue attempt without checkpoint, or retry
and eventually abort OTA
Test: Trigger a checkpoint and fill the disk.
Bug: 119769392
Change-Id: I977cc03b7aef9320d661c8a0d716f8a1ef0be347
abortChanges will attempt to pass a reboot message, and will only reboot
if the device is currently checkpointing. Additionally, it can opt to
attempt to prevent future attempts. This only works for non-bootloader
controlled updates. Failures are ignored, as it will always reboot the
device. In the unlikely event of such a failure, the device will
continue to retry as though you did not ask to prevent future attempts.
Test: vdc checkpoint abortChanges abort_retry_test 1
vdc checkpoint abortChanges abort_noretry_test 0
Change-Id: I7b6214765a1faaf4fd193c73331696b53ae572d2
This makes needCheckpoint return true when the device will or is using
checkpointing.
Test: vdc checkpoint startCheckpoint 1
reboot
vdc checkpoint needsCheckpoint
should return 1 before and after data mounts, and 0 once the
checkpoint has been committed
Change-Id: Ib57f4461d837f41a8110ed318168165a684d913a
Also add vdc checkpoint supportsFileCheckpoint
This is to allow tests to be specific to supported checkpoint mode.
Test: Built on Taimen and Crosshatch, made sure both new functions work
as expected
Change-Id: I0eab7453b13c0a2e31840ef9ad24a692cec55b00
The boot failure symptom is reproduced on Walleye devices. System boots
up after taking OTA and try to upgrade key, but keymaster returns "failed
to ugprade key". Device reboots to recovery mode because of the failure,
and finally trapped in bootloader screen. Possible scenario is:
(After taking OTA)
vold sends old key and op=UPGRADE to keymaster
keymaster creates and saves new key to RPMB, responses new key to vold
vold saves new key as temp key
vold renames temp key to main key -------------- (1) -- still in cache
vold sends old key and op=DELETE_KEY to keymaster
keymaster removes old key from RPMB ------------ (2) -- write directly to RPMB
==> SYSTEM INTERRUPTED BY CRASH OR SOMETHING; ALL CACHE LOST.
==> System boots up, key in RPMB is deleted but key in storage is old key.
Solution: A Fsync is required between (1) and (2) to cover this case.
Detail analysis: b/124279741#comment21
Bug: 112145641
Bug: 124279741
Test: Insert fault right after deleteKey in vold::begin (KeyStorage.cpp),
original boot failure symptom is NOT reproducible.
Change-Id: Ia042b23699c37c94758fb660aecec64d39f39738
Merged-In: Ib8c349d6d033f86b247f4b35b8354d97cf249d26
The boot failure symptom is reproduced on Walleye devices. System boots
up after taking OTA and try to upgrade key, but keymaster returns "failed
to ugprade key". Device reboots to recovery mode because of the failure,
and finally trapped in bootloader screen. Possible scenario is:
(After taking OTA)
vold sends old key and op=UPGRADE to keymaster
keymaster creates and saves new key to RPMB, responses new key to vold
vold saves new key as temp key
vold renames temp key to main key -------------- (1) -- still in cache
vold sends old key and op=DELETE_KEY to keymaster
keymaster removes old key from RPMB ------------ (2) -- write directly to RPMB
==> SYSTEM INTERRUPTED BY CRASH OR SOMETHING; ALL CACHE LOST.
==> System boots up, key in RPMB is deleted but key in storage is old key.
Solution: A Fsync is required between (1) and (2) to cover this case.
Detail analysis: b/124279741#comment21
Bug: 112145641
Bug: 124279741
Test: Insert fault right after deleteKey in vold::begin (KeyStorage.cpp),
original boot failure symptom is NOT reproducible.
Change-Id: Ib8c349d6d033f86b247f4b35b8354d97cf249d26
This allows us to resume rolling back in the event of an unexpected
shutdown during the restore process. We save progress after we process
each log sector, and whenever restarting the current log sector would
result in invalid data.
Test: Run restore, interrupt it, and attempt to resume
Change-Id: I91cf0defb0d22fc5afdb9debc2963c956e9e171c
Restores the first n entries of a checkpoint. Allows automated testing
of interrupted restores.
Test: vdc checkpoint restoreCheckpoint [device] [n]
Change-Id: I47570e8eba0bc3c6549a04a33600df05d393990b
In preparation for restore code, we need to guarantee fsync happens.
Switch over to fd based operations to prepare for that.
Test: Successfully restores device over reboots
Change-Id: Ic9901779e8a4258bf8090d6a62fa9829e343fd39
Motivation:
Early processes launched before the runtime APEX - that hosts the bionic
libs - is activated can't use the bionic libs from the APEX, but from the
system partition (which we call the bootstrap bionic). Other processes
after the APEX activation should use the bionic libs from the APEX.
In order to let both types of processes to access the bionic libs via
the same standard paths /system/lib/{libc|libdl|libm}.so, some mount
namespace magic is used.
To be specific, when the device boots, the init initially bind-mounts
the bootstrap bionic libs to the standard paths with MS_PRIVATE. Early
processes are then executed with their own mount namespaces (via
unshare(CLONE_NEWNS)). After the runtime APEX is activated, init
bind-mounts the bionic libs in the APEX to the same standard paths.
Processes launched thereafter use the bionic libs from the APEX (which
can be updated.)
Important thing is that, since the propagation type of the mount points
(the standard paths) is 'private', the new bind-mount events for the
updated bionic libs should not affect the early processes. Otherwise,
they would experience sudden change of bionic libs at runtime. However,
other mount/unmounts events outside of the private mount points are
still shared across early/late processes as before. This is made possible
because the propagation type of / is 'shared' .
Problem:
vold uses the equality of the mount namespace to filter-out processes
that share the global mount namespace (the namespace of the init). However,
due to the aforementioned change, the early processes are not filtered
out because they have different mount namespaces. As a result,
umount2("/storage/") is executed on them and this unmount event
becomes visible to the global mount namespace (because as mentioned before /
is 'shared').
Solution:
Fiter-out the early processes by skipping a native (non-Java) process
whose UID is < AID_APP. The former condition is because all early
processes are native ones; i.e., zygote is started after the runtime
APEX is activated. The latter condition is to not filter-out native
processes created locally by apps.
Bug: 120266448
Test: m; device boots
Change-Id: I054deedc4af8421854cf35be84e14995523a259a