The flow of I/O path is as follows:
1: When there is a I/O request for a given sector, we first
check the in-memory COW operation mapping for that sector.
2: If the mapping of sector to COW operation is found, then the
existing I/O path will work seamlessly. Even if the COW operation
encodes multiple blocks, we will discard the remaining data.
3: If the mapping of sector to COW operation is not found:
a: Find the previous COW operation as the vector has sorted sectors.
b: If the previous COW operation is a REPLACE op:
i: Check if the current sector is encoded in the previous COW
operations compressed block.
ii: If the sector falls within the range of compressed blocks,
retrieve the block offset.
iii: De-compress the COW operation based on the compression
factor.
iv: memcpy the data based on the block offset.
v: cache the COW operation pointer as subsequent I/O requests
are sequential and can just be a memcpy at the correct offset.
c: If the previous COW operation is not a REPLACE op or if the
requested sector does not fall within the compression factor
of the previous COW operation, then fallback and read the data
from base device.
Snapshot-merge:
During merge of REPLACE ops, read the entire op in one shot, de-compress
multiple blocks and write all the blocks in one shot.
Performance:
go/variable-block-vabc-perf covers detail performance runs
on Pixel 6 for full and incremental OTA.
Bug: 319309466
Test: snapuserd_test covers all the I/O path with various block sizes.
About 252 cases with all combinations and tunables.
[==========] 252 tests from 4 test suites ran. (702565 ms total)
[ PASSED ] 252 tests.
On Pixel 6:
=======================================
COW Writer V3:
for i in full, incremental OTA
for j in 4k, 16k, 32k, 64k, 128, 256k
for k in lz4, zstd, gz
install OTA, reboot, verify merge
=======================================
COW Writer V2:
for i in full, incremental OTA
for j in 4k
for k in lz4, zstd, gz
install OTA, reboot, verity merge
=====================================
Change-Id: I4c3b5c3efa0d09677568b4396cc53db0e74e7c99
Signed-off-by: Akilesh Kailash <akailash@google.com>
This patch supports compression for bigger block size.
3 bits [57-59] in the COW Operation "source_info_" field is used to store
the compression factor. Supported compression factors are power of 2
viz: 4k, 8k, 16k, 32k, 64k, 128k, 256k.
Only REPLACE operations will have the bigger block size support for now.
This can be extended to other operations later.
The write path in EmitBlocks() has the core logic wherein consecutive
sequence of REPLACE ops are compressed based on the compression factor
settings. Thus, for a 64k compression factor, there will be just one
COW operation which encodes all the 16 operation and the entire 64k
block is compressed in one shot.
NOTE: There is no read I/O path support in this patch. Subsequent patch
will have the read support.
Performance data (with read I/O path support in subsequent patch):
go/variable-block-vabc-perf covers detail performance runs
on Pixel 6 for full and incremental OTA.
TL;DR:
Performance of a full OTA (All numbers are compared against 4k block
size)
=======================================
Snapshot-size:
~10-11% decrease in snapshot-size (disk-space) for zstd with 256k block
size.
~8% decrease in snapshot-size (disk-space) for lz4
Install time:
~13% decrease in OTA install time for zstd with 256k block size.
Snapshot-merge:
~50% decrease in snapshot-merge time with 256k block size for zstd
Post OTA boot-time:
~10.5 decrease in boot time for 64k block size for zstd
In-memory footprint for COW operations:
~80% decrease in memory footprint for 256k block size. (58MB -> 9.2MB)
============================================
For more improvements, further tuning of zstd/lz4 is
required primariy the compression levels, zstd compression window,
performance of gz with compression levels.
Bug: 319309466
Test: cow_api test covering all the supported block sizes for v3 writer.
On Pixel 6:
=======================================
COW Writer V3:
for OTA in full, incremental OTA
for block_size in 4k, 16k, 32k, 64k, 128k, 256k
for compression_algo in lz4, zstd, gz, none
install OTA, reboot, verify merge
=======================================
COW Writer V2:
for OTA in full, incremental OTA
for block_size in 4k
for compression_algo in lz4, zstd, gz, none
install OTA, reboot, verity merge
=====================================
Change-Id: I96201f1609582aa9d44d8085852e284b0c4a426d
Signed-off-by: Akilesh Kailash <akailash@google.com>
Intermediate CL needed before variable block size can land. Since v3 is
enabled on cuttlefish, the base build needs to write the
compression_factor in order for reader to properly parse. Otherwise
we'll fail OTA test
Test: th
Change-Id: Ia353aae8e668858851073f09308909ae70d7854e
In the case that op_count_max is read in as zero, we should use the
upper bound of max blocks as the estimation. One case in which this
error can happen is if a v2 cow estimator is used, we should still be
able to run an OTA if we upper bound our ops buffer size estimation.
Test: th
Change-Id: I97ca66368d6631bf43c8911ed66f99c9e8096e2d
Parse manifest compression_factor and set CowOptions appropriately. This
allows v3 writer to use compression factor in OTA. Updating some
comments about supported compression algorithms
Test: th
Change-Id: I88f254087e536d9e5925064f85317f0acce280ee
With variable block size compression being added, the number of ops
written cannot be calculated directly as easily since one op can cover
the data for multiple ops previously. We can get rid of this check for
XOR and Raw blocks as
within WriteOperation() we already make a check to see if we are
exceeding op_count_max limit.
We still need to keep this check for EmitZeroBlocks and EmitCopyBlocks
since the number of operations is determined ahead of time in those
function calls. Without this check in place, the ops will be added to
cached ops and return true when ops cannot be written.
with this change, v3 cow ota now works on cuttlefish with support for
variable block size compression.
Test: th
Change-Id: Ia55f152f5deb67a9022d0feff112345e72741dd3
Changes to structure of v3 header + operation needed for variable block
size. Seperating this CL from the variable block size one so we can get
v3 enabled on cuttlefish
the op count type changes are so that op count matches the type of
max_blocks. Max_blocks is used when op buffer size is not set -> we
default to upper bound of one operation per block in the partition.
Test: th
Bug: 307452468
Change-Id: I1a2581763a4fd6be5d5795f7e4781023e9984256
On devices without metadata encryption, we use loop devices rather than
device-mapper + dm-linear + FIEMAP. Devices without metadata encryption
should not exist, since libfiemap was introduced with Android R, which
requires metadata encryption.
It is possible to retrofit an Android Q device with Virtual A/B, which
is what Pixel 4 did. However those devices can only upgrade to
Android T, and they had metadata encryption anyway.
If there are any Android Q devices that retrofitted Virtual A/B in R,
didn't have metadata encryption, and need to upgrade all the way to V,
then we can recommend they make WrapUserdataIfNeeded() unconditional.
Bug: N/A
Test: fiemap_image_test, vts_libsnapshot_test
Change-Id: I7be0507527b967166676c8b136b8758f5e69ba6b
Right now we encode the per mountpoint scratch dir name like this:
/system -> /mnt/overlay/@system/
/product/app -> /mnt/overlay/@product@app/
This CL changes it to:
/system -> /mnt/overlay/system/
/product/app -> /mnt/overlay/product@app/
This makes it so that the encoded path for top-level mountpoints (like
/system, /vendor) would have the same encoded scratch dir as before
https://r.android.com/2795755 was introduced.
With this change old first-stage-init can handle top-level remounts
correctly. However for mountpoints with '/' in them, their remount
scratch dirs would be encoded with the new format, and old
first-stage-init would ignore and not setup these during boot.
This makes the remount mechanism to function partially when running on
an old ramdisk (first-stage-init) + new system combo.
Normally we expect the init_boot ramdisk to be upgraded alongside
system.img, so this change isn't strictly needed. However there are
cases where we might want to develop new OS features on old vendor
platform, thus this change.
Bug: 306124139
Bug: 243503963
Test: adb-remount-test
Change-Id: I9b43641bb338f11c6c83888880948e4b85af14e1
Some testcases assume that /dev/block/by-name/userdata is writable, but
mount_with_alternatives() will mark block device as RO if mount flag
includes MS_RDONLY. Fix it by marking the block device as RW again.
Test: th
Bug: 319156415
Change-Id: Ic04acd4b6175d3f0aeea88675da44309e8df15e8
Right now we assume all RW mounts (minus /data & special FS) are
remounted by us and we apply the remount/overlayfs related checks
on them unconditionally. This would generate false positives when
a partition was RW but not remounted by us.
The test should instead check mounts that were remounted by us
(transitioned from RO to RW after adb-remount), and ignore
partitions that were already RW before running adb-remount.
Bug: 313609600
Test: adb-remount-test
Change-Id: I94e8a35775271f557790a458781657eb3b24a6f5
Assign CPUSET_SP_BACKGROUND taskprofile to snapshot merge threads.
This will ensure that the threads will not run on big cores.
Additionally, reduce the flushing of data to 1MB after merging REPLACE ops.
No major regression observed on snashot merge time.
On Pixel 6 for incremental OTA of 500M, snapshot merge time increased
from 72 seconds to 76 seconds after this patch.
Bug: 311233916
Test: Full and incremental OTA on Pixel 6 - Verify merge threads not on big cores
Change-Id: I455afdac0b77227869d846d0c4472ea9eb34c41c
Signed-off-by: Akilesh Kailash <akailash@google.com>
Some rw /proc/mounts entries are FUSE.
Also, add some diagnostics for failures.
Bug: 318962836
Test: vts_fs_test on Pixel
Change-Id: I85dec8b37f1a061b1eca597aba3887b598b699f5
Before this patch, DeleteDeviceWithTimeout was checking that the dev
node (i.e. /dev/block/dm-XX) is deleted after the call to DeleteDevice
API. Since ueventd first deletes the symlinks that correspond to a
device and only then deletes this device node, this assertion introduced
a race condition (DeleteDevice API waits for the symlink to be deleted).
This patch changes the DeleteDeviceWithTimeout test to check that unique
path of the device has been deleted.
Bug: 318425605
Test: presubmit
Change-Id: I3fd9de507c75bcf6ac1350fa0b8adfdb5a2e89e8
According to aosp/1908136, the current flow is
1. factory reset formatted raw disk.
2. next boot tries to convert it to metadata encryption
2.a mount sda27
2.b umount sda27
2.c encrypt_inplace()
2.d fsck on dm-x
2.e mount dm-x
If there are some write file operations between 2.a and 2.b, encryption
might fail. To mitigate, change the mount in 2.a to readonly if we know
we are going to do encrypt_inplace.
Test: th
Bug: 313962438
Change-Id: I7f4bbd36e1e6c978dde84f5396ffb90bbbdcae87
Performance of COW v3 is now on par with v2 in both multi-threaded and
single threaded configurations. Note, v2 cow writer can cache up to 1024
blocks in memory if multi-threaded compression is enabled(even though
batch size is configured as 200). For a fair comparison, benchmarks are
ran with batch size of 256. For batch size of 256 or greater, v2 and v3
have similar multi-threaded performance.
Test: th
Bug: 313962438
Change-Id: I377c8291689a7a038bb00b09d7371a155e6972e9
Related change: r.android.com/1110379
noatime reduces the wear and tear on the flash device.
Bug: 313609600
Test: abtd adb-remount-test
Change-Id: Ia42a064f297c25d3463a4ed9094a66236a6c5708
Adding a check here to ensure that next_data_pos_ isn't modified since
initialization. After sizing the sequence buffer, this value should be
the initialized value + the size of sequence buffer.
Test: cow_api_test
Change-Id: I9c79041b72544500989860a13ca6c25830d28750
Update snapshot.cpp to grab estimate_op_buffer_size &
estimate_sequence_buffer_size from update_engine. Update v3 writer to
use these options to size the buffers appropriately.
we probably don't need the fields for merge metrics yet but will leave
it here for now
Test: th
Bug: 313962438
Change-Id: I08252ff66174de9bafaf8dbe9115d9d049084c4c
Adding a cow size info struct as writer will now need to know the op buffer
size at the time of initialization. The sequence of events is as follows
(same as estimate_cow_size but putting down here for clarity)
1. ota_from_target_files does dry run to determine cow size + ops buffer
size
2. data is passed through delta archive manifest
3. snapshot.cpp parses these fields and confgiures cowoptions struct to
pass to writer initialization
4. cow is initialized with correct sizing. Data is incrementally added
at the ends of the cow ops buffer (which is why we need to know the
sizing ahead of time)
Test: ota
Change-Id: I950e5ef82c9bd7e9bd9603b0599c930767ee3f0d
libsnapshot test is run in an independent configuration from
kernel-presubmit. When run in kernel-presubmit, it fails because it
creates another daemon on top of the daemon that is already running from
first stage init.
Bug: 316040872
Test: N/A
Change-Id: Ie3381d6db35bb85fbb47326fa49938416d49f2b8
Signed-off-by: Edward Liaw <edliaw@google.com>
Currently the only ways to enable dm-verity were relying on its built-in
vbmeta image or containing its public key on standalone vbmeta image.
Merging this change will support enabling dm-verity based on hashtree
descriptor root digest for standalone vbmeta image.
Bug: 285855436
Test: Presubmit
Test: adb shell /apex/com.android.virt/bin/vm run-microdroid --vendor /vendor/etc/avf/microdroid/microdroid_vendor.img
Change-Id: I51eb64cae2ca8b4e97f1c6419b35d45e6f51cacb
Performance of V3 COW writer is now on-par with V2 in both incremental
OTA and full OTA.
Test: th
Bug: 313962438
Change-Id: If56e0fe42367f947c513fc4c93119c3825763cb9
If the daemon is alive, detach it before explicitly terminating service.
Bug: 316876960
Test: treehugger presubmit tests
Change-Id: I94d9d1a0dab09a6b016f422c7497098abc86add8
Signed-off-by: Akilesh Kailash <akailash@google.com>
If sequence data is written and the number of ops reaches the maximum,
op data will corrupt the block data because location of block data is
stale after writing sequence data. Fix by resetting location of block
data after EmitSequenceData()
Test: th
Bug: 313962438
Change-Id: Ib53b81772ba341cdf5c240baaee7c10725a365c3
This adds a new metadata header flag to the super partition. This flag
is set when "adb remount" is used, and is implicitly cleared when
flashing.
If there is a scratch partition present on /data, we require that the
flag be set in order to proceed using overlays. If not set, scratch is
not mapped in first-stage init, and scratch images are removed later
during startup.
Bug: 297923468
Test: adb remount -R, touch file in out/, sync, flashall
Change-Id: I9cc411a1632101b5fc043193b38db8ffb9c20e7f
There is no need to connect to daemon for legacy VAB.
Bug: 311900089
Test: treehugger - presubmit
Change-Id: I2256cee611431ab2a286730c61092d2c546caf1e
Signed-off-by: Akilesh Kailash <akailash@google.com>
During PrepareSnapshotPartitionsForUpdate, we attempt to connect to
snapuserd with a 5s timeout, only to tell snapuserd to shutdown
immediately. If snapuserd isn't running, we will wait-out the whole 5
seconds. Change the logic to return early if socket_connect() calls
return ENOENT, indicating that snapuserd socket isn't used by any
process. This reduces allocateSpaceForPayload() time from 6s to 1s.
Test: th
Bug: 315215541
Change-Id: Ib24d7c63733a896c082ac92aaa88ad52d050a2a5
This reverts commit 9d0c06d3e2.
The failure is fixed by https://r.android.com/2725997. Workaround no
longer needed.
Test: manual
Bug: 208565717
Bug: 295944813
Change-Id: I83638938bf52a4b2b1e72743f892c579622ba9e6