Commit graph

1006 commits

Author SHA1 Message Date
Daniel Zheng
070848b9ef libsnapshot: add check for updating next_data_pos_
Adding a check here to ensure that next_data_pos_ isn't modified since
initialization. After sizing the sequence buffer, this value should be
the initialized value + the size of sequence buffer.

Test: cow_api_test
Change-Id: I9c79041b72544500989860a13ca6c25830d28750
2024-01-03 15:19:46 -08:00
Daniel Zheng
3f3162c217 libsnapshot: get options from protobuf fields
Update snapshot.cpp to grab estimate_op_buffer_size &
estimate_sequence_buffer_size from update_engine. Update v3 writer to
use these options to size the buffers appropriately.

we probably don't need the fields for merge metrics yet but will leave
it here for now

Test: th
Bug: 313962438
Change-Id: I08252ff66174de9bafaf8dbe9115d9d049084c4c
2024-01-03 11:09:37 -08:00
Daniel Zheng
d0c3a04cb0 libsnapshot: add CowSizeInfo struct
Adding a cow size info struct as writer will now need to know the op buffer
size at the time of initialization. The sequence of events is as follows
(same as estimate_cow_size but putting down here for clarity)

1. ota_from_target_files does dry run to determine cow size + ops buffer
   size
2. data is passed through delta archive manifest
3. snapshot.cpp parses these fields and confgiures cowoptions struct to
   pass to writer initialization
4. cow is initialized with correct sizing. Data is incrementally added
   at the ends of the cow ops buffer (which is why we need to know the
   sizing ahead of time)

Test: ota
Change-Id: I950e5ef82c9bd7e9bd9603b0599c930767ee3f0d
2024-01-03 11:08:41 -08:00
Treehugger Robot
26cb9dbfef Merge "Support batching ops across Add*Blocks() call" into main 2023-12-20 22:49:38 +00:00
Treehugger Robot
c4b9840456 Merge "Fix EmitSequenceData bug" into main 2023-12-20 22:49:38 +00:00
Kelvin Zhang
a008c9c1a4 Support batching ops across Add*Blocks() call
Performance of V3 COW writer is now on-par with V2 in both incremental
OTA and full OTA.

Test: th
Bug: 313962438
Change-Id: If56e0fe42367f947c513fc4c93119c3825763cb9
2023-12-19 16:32:02 -08:00
Treehugger Robot
e0b444802b Merge "Add op count check before attempting to write operations" into main 2023-12-19 20:18:41 +00:00
Akilesh Kailash
1752c5f249 libsnapshot: Detach the daemon explicitly before stopping the service
If the daemon is alive, detach it before explicitly terminating service.

Bug: 316876960
Test: treehugger presubmit tests
Change-Id: I94d9d1a0dab09a6b016f422c7497098abc86add8
Signed-off-by: Akilesh Kailash <akailash@google.com>
2023-12-18 17:22:06 -08:00
Kelvin Zhang
c85038b866 Fix EmitSequenceData bug
If sequence data is written and the number of ops reaches the maximum,
op data will corrupt the block data because location of block data is
stale after writing sequence data. Fix by resetting location of block
data after EmitSequenceData()

Test: th
Bug: 313962438
Change-Id: Ib53b81772ba341cdf5c240baaee7c10725a365c3
2023-12-15 20:12:20 -08:00
Kelvin Zhang
73ac5f184e Add op count check before attempting to write operations
Test: th
Bug: 313962438
Change-Id: I0e288a42984d737d327236693a6b69c03a7ecc6e
2023-12-14 16:42:45 -08:00
Treehugger Robot
e2c6171f65 Merge changes If3a01ab8,Ib24d7c63 into main
* changes:
  Support batch writes for V3 cow format
  Optimize PrepareSnapshotPartitionsForUpdate runtime
2023-12-14 19:19:31 +00:00
Treehugger Robot
b0c6a5dfd7 Merge "Revert "snapuserd: opt out of Global ThinLTO to workaround segfault"" into main 2023-12-14 00:24:57 +00:00
Kelvin Zhang
557104c85a Support batch writes for V3 cow format
Test: th
Bug: 313962438
Change-Id: If3a01ab85a1a649b7476bee2d56b732f04d0509a
2023-12-13 14:22:12 -08:00
Akilesh Kailash
4ffbc33b14 libsnapshot: skip connecting to daemon for legacy VAB
There is no need to connect to daemon for legacy VAB.

Bug: 311900089
Test: treehugger - presubmit

Change-Id: I2256cee611431ab2a286730c61092d2c546caf1e
Signed-off-by: Akilesh Kailash <akailash@google.com>
2023-12-12 14:12:49 -08:00
Kelvin Zhang
cb3cfc1655 Optimize PrepareSnapshotPartitionsForUpdate runtime
During PrepareSnapshotPartitionsForUpdate, we attempt to connect to
snapuserd with a 5s timeout, only to tell snapuserd to shutdown
immediately. If snapuserd isn't running, we will wait-out the whole 5
seconds. Change the logic to return early if socket_connect() calls
return ENOENT, indicating that snapuserd socket isn't used by any
process. This reduces allocateSpaceForPayload() time from 6s to 1s.

Test: th
Bug: 315215541
Change-Id: Ib24d7c63733a896c082ac92aaa88ad52d050a2a5
2023-12-12 13:36:34 -08:00
Yi Kong
edd04f1b38 Revert "snapuserd: opt out of Global ThinLTO to workaround segfault"
This reverts commit 9d0c06d3e2.

The failure is fixed by https://r.android.com/2725997. Workaround no
longer needed.

Test: manual
Bug: 208565717
Bug: 295944813
Change-Id: I83638938bf52a4b2b1e72743f892c579622ba9e6
2023-12-12 16:35:07 +09:00
Kelvin Zhang
84e5e6f751 Support batch writes for non-compressed ops
This also improves atomicity of ops. If a single Add*Blocks() call
with 100 blocks failed in the middle, partially written blocks would be
discarded, and op count on disk stays unchanged. Previously wew ould
update the op count on disk with partially written blocks, causing
labels to be inaccurate.

Test: th
Bug: 313962438
Change-Id: If175a705f6ec46c1b25c52d0d9f02f01a540ce55
2023-12-06 16:41:25 -08:00
Kelvin Zhang
d45c50911f Fix cow v3 size estimation errors
Current cow size estimation is computed by taking the offset of next
data write. However, during COW size estimation we repeatdly increment
op_count_max. Since data section is placed right after op section,
incrementing  op_count_max has the effect of moving the beginning of
data section. next_data_pos_ is never updated in this process, causing
the final estiamte to be off.

Test: th
Bug: 313962438
Change-Id: I250dff54c470c9c20d6db33d91bac898358dee31
2023-12-05 21:27:38 -08:00
Kelvin Zhang
ad51f09b05 Merge "Fix multiple calls to set_[source/type]" into main 2023-12-05 20:15:41 +00:00
Kelvin Zhang
12e0531224 Fix multiple calls to set_[source/type]
Since currently implementation just does an bitwise or with the
storage unit, multiple calls to set_source would result in overlapping
bits. Fix by first clear the existing storage.

Test: th
Bug: 313962438
Change-Id: Iecfe8dd244c0f65ecd3cacb0404fdc39ef836d97
2023-12-05 10:28:12 -08:00
Akilesh Kailash
160e4c3cee Merge changes from topic "ota-tune-1" into main
* changes:
  Allow direct reads on source device
  libsnapshot: Tune readahead during OTA for source and COW block device
2023-12-04 23:53:41 +00:00
Akilesh Kailash
52f1c19a17 Allow direct reads on source device
Allow O_DIRECT reads on source block device.
This will further cut down the Active and Inactive file pages
during partition verification.

On Pixel 6 after incremental OTA - Post OTA reboot:

		Without patch      With patch     Delta
--------------------------------------------------------
Inactive(File):  4992MB             3887MB         ~22%
Active(File):    1465MB             1014MB         ~30%

Boot time however increases from 25 to 30 seconds.

This is not yet enabled. This will be behind a sysprop flag
or for low memory devices and will be enabled later.

Additionally, set the priority of worker threads to normal.
Merge threads priority is reduced. This will help low memory
devices as tested on Pixel watch.

Bug: 311233916
Test: OTA on Pixel 6
Change-Id: Icacdef08d68e28d3062611477703e7cf393a9f10
Signed-off-by: Akilesh Kailash <akailash@google.com>
2023-12-04 14:07:42 -08:00
Akilesh Kailash
a8f6ce3344 libsnapshot: Tune readahead during OTA for source and COW block device
Scanning of partitions post OTA leads to memory pressure. Tune
the read-ahead of source and COW block device. This is currently
set to 32KB.

This reduces Inactive(file) and Active(file) usage during entire
duration of boot post OTA.

On Pixel 6: For incremental OTA ~400M. During boot:

                            Without-patch         With-patch    Delta
			    --------------------------------------------
1: Peak Inactive(file):     4469MB                3118MB        ~30%

2: Peak Active(file):       985MB                 712MB         ~27%

No regression observed on boot time.

Additionally, cut down the number of threads to verify the partitions.

Bug: 311233916
Test: Incremental OTA on Pixel 6
Change-Id: I0b842776c36fa089c39c170fa7bf0f246e16636d
Signed-off-by: Akilesh Kailash <akailash@google.com>
2023-12-04 14:07:26 -08:00
Kelvin Zhang
7d526560df Allow Cow version v3 to be used
This does not change the default cow version. Currently v2 is still the
default. This CL only enables OEMs to set virtual_ab_cow_version to 3
for testing purposes.

Test: th
Bug: 313962438
Change-Id: I7a328fa32283560a48604ffe02edd2551ac49a83
2023-12-04 09:29:20 -08:00
Kelvin Zhang
2bf1da5d2c Shove CowOperation type into source_info
We can shove type into source info to save 8 bits in per cow operation.
We only need 4 bits inside of source_info to enumerate all the types of
Cow Operation. Since CowOperationV3 is not used on disk(just yet) , we
can make format changes.

This CL is mechanical:
    1. Remove tye .type field from CowOperation struct
    2. Add a type() getter method to CowOperation struct
    3. Replace all existing usage of `type` member with the new getter

No functional changes, just refactorings.

Test: th
Bug: 304602386
Bug: 313962438

Change-Id: I85d89c71fc6afede12ea299a4a3e3b2184ea2d8b
2023-12-04 09:14:20 -08:00
Akilesh Kailash
37e7498fc0 Merge "snapshotctl: fsync after writing every 1MB buffer" into main 2023-11-29 20:37:55 +00:00
Akilesh Kailash
b78d0e2856 snapshotctl: fsync after writing every 1MB buffer
Sync writes after every 1MB instead of flushing at the end.

Bug: 299011882
Test: Boot device off snapshots
Change-Id: If91168ec92c2b2995bdf296ea1c7d4c261b12411
Signed-off-by: Akilesh Kailash <akailash@google.com>
2023-11-29 11:26:46 -08:00
Kelvin Zhang
1ccb347e87 Turn CowOperationType into an enum
There's a bug previsouly where we compare return value of
GetCowOpSourceInfoData() with CowOperationType. Such bugs are possible
because cow operation enums are weakly typed integers. Turn
CowOperationType into strongly typed enum to prevent such bugs.

Test: th
Bug: 304602386
Change-Id: If6941a4740c374ed066cf0aee9e52f4df05a9b38
2023-11-29 10:46:10 -08:00
Daniel Zheng
43aeb22858 libsnapshot: add sequence data
v3 writer to write sequence data. Sequence data will be written after
the scratch space and before the resume space. Since this is just a list
of integers, writing and reading should be trivial

Test: cow_api_test
Change-Id: If3b6b1cfa155aeb65bf693263fc373154ba8e81d
2023-11-21 09:50:16 -08:00
Daniel Zheng
209fda3562 libsnapshot: move header op count setup
Op count should be set before we sync the header. This way subsequence
writers can initialize with the correct op buffer size

Test: cow_api_test
Change-Id: I56a0d747b3f2a1d9d582d8f9d643b81cbdd9b8d7
2023-11-20 11:53:21 -08:00
Daniel Zheng
763776435d libsnapshot: sync header metadata
After we write emit a label, we need to update the number of resume
points + sequence data and op_count. Realistically we could just call
Finalize, but maybe synching these specific fields could prevent
unexpected outcomes.

Test: cow_api_test
Change-Id: I1585601a134221689ce8d5675a2a3e32f1e8a0e6
2023-11-20 11:53:20 -08:00
Daniel Zheng
5d30009a7e libsnapshot: update variable name
updating name to count rather than buffer size

Test: cow_api_test
Change-Id: I9e44330e7a230b5ab5f5e914ef74a63cc4ebaa61
2023-11-20 11:40:36 -08:00
Daniel Zheng
95cc6b6f01 libsnapshot: update resume offset calculation
Update resume offset calculation to use function call

Test: cow_api_test
Change-Id: I7a9a86dc007110d02d889d1e59b24c3068b8d9e9
2023-11-20 03:30:40 -08:00
Akilesh Kailash
91161042b7 Merge "Disable partition verification when device boots on snapshot" into main 2023-11-15 05:14:50 +00:00
Akilesh Kailash
889a5d23af Disable partition verification when device boots on snapshot
No need partition verification when device boots on snapshot without
slot switch.

This also saves couple of seconds of boot time.

Bug: 299011882
Test: Boot device on snapshot, OTA on Pixel
Change-Id: I5b781de7e0f745bbfe9646f88ca912139b2d853e
Signed-off-by: Akilesh Kailash <akailash@google.com>
2023-11-14 16:58:59 -08:00
Daniel Zheng
59ce7a45d1 libsnapshot: update offset functions
Since these functions are used across both parser and writer, updating
it as inline functions in cow_format.

Test: cow_api_test
Change-Id: I9824684e3b9b48947accce935335d4019d745ae0
2023-11-14 12:40:12 -08:00
Daniel Zheng
3200697586 Merge "libsnapshot: resume_point_count" into main 2023-11-14 20:39:42 +00:00
Daniel Zheng
9270152900 Merge changes I19568d11,I08204e2d into main
* changes:
  libsnapshot: update FindResumeOp type
  libsnapshot: v3 writer GetCowSize
2023-11-14 20:30:19 +00:00
Akilesh Kailash
f1f06f8678 libsnapshot: Check if OTA update in progress during reboot
If any of the read-only partitions are mounted off dm-user
then certainly update is in-progress.

Bug: 308900853
Test: OTA on Pixel, reboot during OTA.
Change-Id: I36121e1d99ec7c1f1110a65fc67996190875af18
Signed-off-by: Akilesh Kailash <akailash@google.com>
2023-11-14 16:04:17 +00:00
Daniel Zheng
f897650f6e libsnapshot: resume_point_count
We want to add a resume_point_count in the header to represent how many
resume points we've written. In the case that we've written less than
resume_buffer_size, we only want to read the valid resume points.

without these changes incremental OTA runs into segfault or have faulty
data when trying to FindResumeOp() as our resume points contain invalid
entries

Test: full ota followed by inc ota on cuttlefish
Change-Id: I0a8971955439639f2d0f39d9d518c1145ae15c3d
2023-11-13 15:57:25 -08:00
Daniel Zheng
a503453767 libsnapshot: update FindResumeOp type
Update FindResumeOp to take in a uint64_t to match the value of the
caller function

Test: ota with following CL
Change-Id: I19568d119b7ebd75ea9e98970b311ae7da92ff0e
2023-11-13 15:57:24 -08:00
Daniel Zheng
c1a18756dc libsnapshot: v3 writer GetCowSize
Cow size should just be wherever the last data position is written. In
v3 we no longer have a footer, so this calculation is simple. This
function is used by cow estimator

Test: cow_api_test
Change-Id: I08204e2d560b120450019a529baa41de9b8e66d5
2023-11-13 13:17:50 -08:00
Daniel Zheng
e343580f72 Merge "libsnapshot: update cow estimation" into main 2023-11-13 18:13:09 +00:00
Daniel Zheng
a4f80e5ca3 Merge "libsnapshot: implement resume buffer" into main 2023-11-13 18:13:01 +00:00
David Anderson
92b29e1925 libsnapshot: Add a test case for recent decompress regression.
The attached test data is a 4096 length byte run that gz compresses to
exactly 4096 bytes. This exposes an edge case in CowReader::ReadData
with v2 snapshot files.

Bug: 310191184
Test: cow_api_test
Change-Id: I35e8d7e939d607d1dc118285ebc2f636c2291a20
2023-11-11 00:29:41 +00:00
Daniel Zheng
79a68a934b libsnapshot: update cow estimation
Cow estimator needs to update next_data_pos_ to be in the correct
position

Test: th
Change-Id: I1e3f2c9434573197e840be5637a90c679610ac4e
2023-11-09 15:49:15 -08:00
Daniel Zheng
c2ce084889 libsnapshot: implement resume buffer
Add resume space to cow v3. Resume buffer goes after header and scratch
space, and is currently set to contain 4 resume points. When AddLabel is
called, the oldest label is replaced with newest one.

Parser will parse up until the last resumable op from a given label.

Test: cow_api_test
Change-Id: Ie072f245721776887d59c96dad296965ad31a5cc
2023-11-09 15:48:42 -08:00
Daniel Zheng
04ca59d6ff libsnapshot: add compatibility check
Ensure that cow was written by v3 writer for the data_length check to
work. All ops written by v2 writer should go through the decompressor
path if a compressor was used.

Test: cow_api_test
Change-Id: I053d6fdaf29ef7001e68f43b45d5a3ff1a36b1c3
2023-11-09 11:32:02 -08:00
Daniel Zheng
e363841e75 libsnapshot: Add single threaded compression to v3 writer
Add compression path back into Cow operations. Main change is that the
compression algorithm is stored in the header instead of each individaul
op. Have the writer_v3 set this algorithm when parsing options.

There looks to be a lot of code we'll be able to factor out into the
base class, but we can leave that to a later CL.

Test: cow_api_test
Change-Id: Ie9a8eceb5fbdaecae50911119c75f2e51d776a28
2023-11-07 14:28:57 -08:00
Daniel Zheng
1c9f0474a6 test_v3: write multiple ops
test case for reading and writing multiple operations of different types

Test: cow_api_test
Change-Id: I8d59a460a50c7054df0b17dc44dd6605048682aa
2023-11-06 15:05:22 -08:00