Also add vdc checkpoint supportsFileCheckpoint
This is to allow tests to be specific to supported checkpoint mode.
Test: Built on Taimen and Crosshatch, made sure both new functions work
as expected
Change-Id: I0eab7453b13c0a2e31840ef9ad24a692cec55b00
The boot failure symptom is reproduced on Walleye devices. System boots
up after taking OTA and try to upgrade key, but keymaster returns "failed
to ugprade key". Device reboots to recovery mode because of the failure,
and finally trapped in bootloader screen. Possible scenario is:
(After taking OTA)
vold sends old key and op=UPGRADE to keymaster
keymaster creates and saves new key to RPMB, responses new key to vold
vold saves new key as temp key
vold renames temp key to main key -------------- (1) -- still in cache
vold sends old key and op=DELETE_KEY to keymaster
keymaster removes old key from RPMB ------------ (2) -- write directly to RPMB
==> SYSTEM INTERRUPTED BY CRASH OR SOMETHING; ALL CACHE LOST.
==> System boots up, key in RPMB is deleted but key in storage is old key.
Solution: A Fsync is required between (1) and (2) to cover this case.
Detail analysis: b/124279741#comment21
Bug: 112145641
Bug: 124279741
Test: Insert fault right after deleteKey in vold::begin (KeyStorage.cpp),
original boot failure symptom is NOT reproducible.
Change-Id: Ib8c349d6d033f86b247f4b35b8354d97cf249d26
This allows us to resume rolling back in the event of an unexpected
shutdown during the restore process. We save progress after we process
each log sector, and whenever restarting the current log sector would
result in invalid data.
Test: Run restore, interrupt it, and attempt to resume
Change-Id: I91cf0defb0d22fc5afdb9debc2963c956e9e171c
Restores the first n entries of a checkpoint. Allows automated testing
of interrupted restores.
Test: vdc checkpoint restoreCheckpoint [device] [n]
Change-Id: I47570e8eba0bc3c6549a04a33600df05d393990b
In preparation for restore code, we need to guarantee fsync happens.
Switch over to fd based operations to prepare for that.
Test: Successfully restores device over reboots
Change-Id: Ic9901779e8a4258bf8090d6a62fa9829e343fd39
Motivation:
Early processes launched before the runtime APEX - that hosts the bionic
libs - is activated can't use the bionic libs from the APEX, but from the
system partition (which we call the bootstrap bionic). Other processes
after the APEX activation should use the bionic libs from the APEX.
In order to let both types of processes to access the bionic libs via
the same standard paths /system/lib/{libc|libdl|libm}.so, some mount
namespace magic is used.
To be specific, when the device boots, the init initially bind-mounts
the bootstrap bionic libs to the standard paths with MS_PRIVATE. Early
processes are then executed with their own mount namespaces (via
unshare(CLONE_NEWNS)). After the runtime APEX is activated, init
bind-mounts the bionic libs in the APEX to the same standard paths.
Processes launched thereafter use the bionic libs from the APEX (which
can be updated.)
Important thing is that, since the propagation type of the mount points
(the standard paths) is 'private', the new bind-mount events for the
updated bionic libs should not affect the early processes. Otherwise,
they would experience sudden change of bionic libs at runtime. However,
other mount/unmounts events outside of the private mount points are
still shared across early/late processes as before. This is made possible
because the propagation type of / is 'shared' .
Problem:
vold uses the equality of the mount namespace to filter-out processes
that share the global mount namespace (the namespace of the init). However,
due to the aforementioned change, the early processes are not filtered
out because they have different mount namespaces. As a result,
umount2("/storage/") is executed on them and this unmount event
becomes visible to the global mount namespace (because as mentioned before /
is 'shared').
Solution:
Fiter-out the early processes by skipping a native (non-Java) process
whose UID is < AID_APP. The former condition is because all early
processes are native ones; i.e., zygote is started after the runtime
APEX is activated. The latter condition is to not filter-out native
processes created locally by apps.
Bug: 120266448
Test: m; device boots
Change-Id: I054deedc4af8421854cf35be84e14995523a259a
I'm not convinced this explains the full regression, but it's a
worthwhile fix anyway.
Bug: 124774357
Test: Booted in checkpoint mode and non checkpoint mode
Change-Id: I6e0e1e59e27bd127feac218fff7d88bb3570b530
When running a live GSI, userdata is a logical partition. If we don't
fix up the fstab we'll derive the underlying block device instead of
the device-mapper node for userdat_gsi, resulting in a corrupt data
partition for both images.
Bug: 123906417
Test: manual test
Change-Id: Ic0101f30504de26e725442da2da3888008c31b63
This marks the slot as successful within commitChanges, increasing the
available roll back window significantly.
Test: When taking an update on a checkpoint enabled device, it
marks the slot as successful just before committing the
checkpoint. Visible in logs as call to vdc commitChanges,
followed by "Marked slot as booted succesfully."
Bug: 123260515
Change-Id: If71fcde57b3bdee2cfaabb590f123a2d00da3228