Take action if we are running out of checkpoint space.
Configurable via ro.sys properties.
ro.sys.cp_usleeptime = Time to sleep between checks
ro.sys.cp_min_free_bytes = Min free space to act on
ro.sys.cp_commit_on_full = action to take. Either commits or reboots to
continue attempt without checkpoint, or retry
and eventually abort OTA
Test: Trigger a checkpoint and fill the disk.
Bug: 119769392
Change-Id: I977cc03b7aef9320d661c8a0d716f8a1ef0be347
abortChanges will attempt to pass a reboot message, and will only reboot
if the device is currently checkpointing. Additionally, it can opt to
attempt to prevent future attempts. This only works for non-bootloader
controlled updates. Failures are ignored, as it will always reboot the
device. In the unlikely event of such a failure, the device will
continue to retry as though you did not ask to prevent future attempts.
Test: vdc checkpoint abortChanges abort_retry_test 1
vdc checkpoint abortChanges abort_noretry_test 0
Change-Id: I7b6214765a1faaf4fd193c73331696b53ae572d2
This makes needCheckpoint return true when the device will or is using
checkpointing.
Test: vdc checkpoint startCheckpoint 1
reboot
vdc checkpoint needsCheckpoint
should return 1 before and after data mounts, and 0 once the
checkpoint has been committed
Change-Id: Ib57f4461d837f41a8110ed318168165a684d913a
Also add vdc checkpoint supportsFileCheckpoint
This is to allow tests to be specific to supported checkpoint mode.
Test: Built on Taimen and Crosshatch, made sure both new functions work
as expected
Change-Id: I0eab7453b13c0a2e31840ef9ad24a692cec55b00
The boot failure symptom is reproduced on Walleye devices. System boots
up after taking OTA and try to upgrade key, but keymaster returns "failed
to ugprade key". Device reboots to recovery mode because of the failure,
and finally trapped in bootloader screen. Possible scenario is:
(After taking OTA)
vold sends old key and op=UPGRADE to keymaster
keymaster creates and saves new key to RPMB, responses new key to vold
vold saves new key as temp key
vold renames temp key to main key -------------- (1) -- still in cache
vold sends old key and op=DELETE_KEY to keymaster
keymaster removes old key from RPMB ------------ (2) -- write directly to RPMB
==> SYSTEM INTERRUPTED BY CRASH OR SOMETHING; ALL CACHE LOST.
==> System boots up, key in RPMB is deleted but key in storage is old key.
Solution: A Fsync is required between (1) and (2) to cover this case.
Detail analysis: b/124279741#comment21
Bug: 112145641
Bug: 124279741
Test: Insert fault right after deleteKey in vold::begin (KeyStorage.cpp),
original boot failure symptom is NOT reproducible.
Change-Id: Ib8c349d6d033f86b247f4b35b8354d97cf249d26
This allows us to resume rolling back in the event of an unexpected
shutdown during the restore process. We save progress after we process
each log sector, and whenever restarting the current log sector would
result in invalid data.
Test: Run restore, interrupt it, and attempt to resume
Change-Id: I91cf0defb0d22fc5afdb9debc2963c956e9e171c
Restores the first n entries of a checkpoint. Allows automated testing
of interrupted restores.
Test: vdc checkpoint restoreCheckpoint [device] [n]
Change-Id: I47570e8eba0bc3c6549a04a33600df05d393990b
In preparation for restore code, we need to guarantee fsync happens.
Switch over to fd based operations to prepare for that.
Test: Successfully restores device over reboots
Change-Id: Ic9901779e8a4258bf8090d6a62fa9829e343fd39
Motivation:
Early processes launched before the runtime APEX - that hosts the bionic
libs - is activated can't use the bionic libs from the APEX, but from the
system partition (which we call the bootstrap bionic). Other processes
after the APEX activation should use the bionic libs from the APEX.
In order to let both types of processes to access the bionic libs via
the same standard paths /system/lib/{libc|libdl|libm}.so, some mount
namespace magic is used.
To be specific, when the device boots, the init initially bind-mounts
the bootstrap bionic libs to the standard paths with MS_PRIVATE. Early
processes are then executed with their own mount namespaces (via
unshare(CLONE_NEWNS)). After the runtime APEX is activated, init
bind-mounts the bionic libs in the APEX to the same standard paths.
Processes launched thereafter use the bionic libs from the APEX (which
can be updated.)
Important thing is that, since the propagation type of the mount points
(the standard paths) is 'private', the new bind-mount events for the
updated bionic libs should not affect the early processes. Otherwise,
they would experience sudden change of bionic libs at runtime. However,
other mount/unmounts events outside of the private mount points are
still shared across early/late processes as before. This is made possible
because the propagation type of / is 'shared' .
Problem:
vold uses the equality of the mount namespace to filter-out processes
that share the global mount namespace (the namespace of the init). However,
due to the aforementioned change, the early processes are not filtered
out because they have different mount namespaces. As a result,
umount2("/storage/") is executed on them and this unmount event
becomes visible to the global mount namespace (because as mentioned before /
is 'shared').
Solution:
Fiter-out the early processes by skipping a native (non-Java) process
whose UID is < AID_APP. The former condition is because all early
processes are native ones; i.e., zygote is started after the runtime
APEX is activated. The latter condition is to not filter-out native
processes created locally by apps.
Bug: 120266448
Test: m; device boots
Change-Id: I054deedc4af8421854cf35be84e14995523a259a
I'm not convinced this explains the full regression, but it's a
worthwhile fix anyway.
Bug: 124774357
Test: Booted in checkpoint mode and non checkpoint mode
Change-Id: I6e0e1e59e27bd127feac218fff7d88bb3570b530
When running a live GSI, userdata is a logical partition. If we don't
fix up the fstab we'll derive the underlying block device instead of
the device-mapper node for userdat_gsi, resulting in a corrupt data
partition for both images.
Bug: 123906417
Test: manual test
Change-Id: Ic0101f30504de26e725442da2da3888008c31b63
This marks the slot as successful within commitChanges, increasing the
available roll back window significantly.
Test: When taking an update on a checkpoint enabled device, it
marks the slot as successful just before committing the
checkpoint. Visible in logs as call to vdc commitChanges,
followed by "Marked slot as booted succesfully."
Bug: 123260515
Change-Id: If71fcde57b3bdee2cfaabb590f123a2d00da3228
VoldUtils already has a pre-parsed fstab. Use it instead.
Test: Checkpoint functions continue to work
Change-Id: I96cbab467a7b809c92c4f6cdf7a06abca8c5aa5e
cryptfs.cpp and MetadataCrypt.cpp can use android::vold::sFsckContext directly.
hash.h is unuseful.
Test: make
Change-Id: I7acdac97d6ed1c9b2a5dc367fcea8aa2942192e8
Log the main configuration of the dm-crypt device -- the name, the
cipher, the keysize, the real device, and the length -- in addition to
the extra parameters which we were already logging.
(We can't simply log the actual string passed to the kernel, of course,
because that includes the key. So we choose the fields individually.)
Test: booted device configured to use FDE and checked the log message
Change-Id: Ia95de807c4fad68d93b7e7e73508a01e5139dc76
This is needed to make adoptable storage volumes work with a 4K crypto
sector size when the block device size is not a multiple of 4K.
It is fine to do this because the filesystem ends on a 4K boundary
anyway and doesn't use any partial block at the end.
Bug: 123375298
Test: booted device configured to use FDE with sector size 4k, ran
'sm set-virtual-disk true' and formatted the virtual SD card as
adoptable storage. Then did the same but with a temporary patch
that changed kSizeVirtualDisk to be misaligned
Change-Id: I95ee6d7dcaaa8989c674aea9988c09116e830b0c
Copy the existing mount options when remounting f2fs for checkpointing
mode.
Bug: 123376509
Test: Boot with checkpointing, and ensure entries match fstab
Change-Id: If022d9872a44657b550ab892259230805716dc77