snapuserd is used as a user-space block device implementation during
Virtual A/B Compression-enabled updates. It has to be started in
first-stage init, so that updated partitions can be mounted.
Once init reaches second-stage, and sepolicy is loaded, we want to
re-launch snapuserd at the correct privilege level. We accomplish this
by rebuilding the device-mapper tables of each block device, which
allows us to re-bind the kernel driver to a new instance of snapuserd.
After this, the old daemon can be shut down.
Ideally this transition happens as soon as possible, before any .rc
scripts are run. This minimizes the amount of time the original
snapuserd is running, as well as any ambiguity about which instance of
snapuserd is the correct one.
The original daemon is sent a SIGTERM signal once the transition is
complete. The pid is stored in an environment variable to make this
possible (these details are implemented in libsnapshot).
Bug: 168259959
Test: manual test
Change-Id: Ife9518e502ce02f11ec54e7f3e6adc6f04d94133
This patch introduces the fundamentals needed to support booting off
dm-user. First, a method has been added to start snapuserd in
first-stage init. It simply forks and execs, creates a specially named
first-stage socket, then waits for requests.
Next, a new method has been added to SnapshotManager to perform a
second-stage handoff. This works by first launching a second copy of
snapuserd using init's normal service management functionality. The new
snapuserd runs alongside the original, but has correct privileges and a
correct selinux context. Next, we inspect each COW device, and if its
table uses dm-user, we replace the table with a renamed control
device. The new control device is bound to the new snapuserd.
device-mapper guarantees that such a table swap is safe. It flushes I/O
to the old table and then replaces it with the new table. Once the new
table is in place, the old dm-user control devices are automatically
destroyed. Thus, once all dm-user devices has been transitioned, the
first-stage daemon is idle and can gracefully exit.
This patch does not modify init. A few changes will be needed on top of
this patch:
(1) CreateLogicalAndSnapshotPartitions will need further changes to
start the first-stage daemon and track its pid. Additionally, it will
need to ensure the named socket file is deleted, so there is no further
IPC allowed after partitions are completed.
(2) init will need to propagate the pid to second-stage init so the
process can be killed (or signalled).
(3) first-stage snapuserd will need to gracefully exit once it has no
active handler threads.
(4) second-stage init will need to invoke the transition helper on
SnapshotMaanager, ideally as soon as feasible.
Bug: 168259959
Test: manual test
Change-Id: I54dec2edf85ed95f11ab4518eb3d7dbaf0bdcbfd
* changes:
libsnapshot: Remove the timeout on client recv().
libsnapshot: Integrate with snapuserd.
snapuserd: Add an API call to wait for device deletion.
This simple tool will dump the COW header and included ops to stdout.
Bug: N/A
Test: mm inspect_cow && inspect_cow <file>
Change-Id: I369c4a21a84c95ffc10670bd9eeb2ceccb2a56d6
This reverts commit 2f77d1adc8.
Reason for revert: Applying a Fix to DS directly. No need for merged-In, since the topic is already landed in DS branch
Change-Id: I86cba9b20efebc9e700522e1697bc8f893c43089
This is so update engine can resume from the correct label.
Bug: 168554689
Test: vts_libsnapshot_test
Change-Id: Ib04e80e8219f954f105d5a85f86efa7bb9097579
Bug: 168554689
Test: vts_libsnapshot_test
Test: full OTA with update_device.py
Test: incremental OTA with update_device.py
Change-Id: I3878abfd767d2e47cf8486bc2c06233da2f1ef08
The critical services can now using the interface `critical
[window=<fatal crash window mins>] [target=<fatal reboot target>]` to
setup the timing window that when there are more than 4 crashes in it,
the init will regard it as a fatal system error and reboot the system.
Config `window=${zygote.critical_window.minute:-off}' and
`target=zygote-fatal' for all system-server services, so platform that
configures ro.boot.zygote_critical_window can escape the system-server
crash-loop via init fatal handler.
Bug: 146818493
Change-Id: Ib2dc253616be6935ab9ab52184a1b6394665e813
The value of entry.mount_point for data partition is "/data"
Fixes: 5ba5b90cd6 ("fs_mgr: try tune2fs for casefolding on /data only")
Test: got "Can't mount with encoding and encryption" problem reported
by the db845c build with the default 5.4.38 prebuilt kernel
Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org>
Change-Id: I226a2275f5f2ee18503c5a3863ef5a1d2c2ed7be
Two seconds is a bit aggressive - considering this is analagous to a
synchronous binder call, let's drop the timeout entirely.
Bug: 168554689
Test: vts_libsnapshot_test
Change-Id: I2b3f5b33f79575d72b15ed314dbcc0ad20ebd9a8
This integrates libsnapshot with dm-user and snapuserd. Tests progress
significantly further now. Tests involving merging still fail as
snapuserd doesn't support this yet.
Bug: 168554689
Test: vts_libsnapshot_test
Change-Id: I464b683b464fe29a646f0f2823b7f4434a878614
This adds a new message to the daemon protocol, which waits for a device
to be deleted. The caller must ensure that the corresponding control
device is actually going away (eg, the device containing the dm-user
table entry has been deleted). Otherwise, this will hang.
This will allow libsnapshot to safely delete the cow since any
outstanding references will be closed.
This also refactors DmUserHandler so that it's freed (and removed from
the handler list) if its corresponding thread exits of its own accord.
Bug: 168554689
Test: vts_libsnapshot_test
Change-Id: I8e97c543eec84874c88795a493470e992dc476fc
This refactors SnapuserdClient so it retains a connection for its
lifetime. This allows SnapshotManager to ensure the daemon is running
and hold a connection open across all of its operations.
The main impetus of this change is to remove the ambiguity between first
and second-stage sockets. SnapshotManager should only ever connect to
the first-stage socket during first-stage init, or, to initiate the
"transition" step during second-stage init.
The transition steps are roughly:
(1) Start second-stage daemon.
(2) Load new device-mapper tables.
(3) Connect second-stage daemon to new dm-user devices.
(4) Activate the new tables, flushing IO to the first-stage daemon.
(5) Send a signal to the first-stage daemon to exit.
This patch makes it easier to hold these two separate connections.
Bug: 168554689
Test: manual test
Change-Id: I51cb9adecffb19143ed685e0c33456177ec3d81f
This is in preparation for moving to a traditional client/server model
where clients stay connected and the server multiplexes multiple
connections.
Client has been renamed to DmUserClient to differentiate it from local
socket clients.
poll() responsibilities have been moved into SnapuserdServer. In
addition, the server now tracks all open clients and polls them
together with the listen socket.
SnapuserDaemon is now only responsible for signal masking. These two
classes can probably be merged together - I didn't do that here because
the patch was already large.
Bug: 168554689
Test: manual test
Change-Id: Ibc06f6287d49e832a8e25dd936ec07747a1b0555
GetLastLabel returns the last Label that a reader is confident about.
InitializeAppend starts a writer up to append data after the last given
label, assuming all later labels are not relevant data.
Change-Id: I3339d5527bae833d9293cbbc63126136b94bd976
Bug: 168829493
Test: cow_api_test
This switches up the format to alternate ops with data, followed by a
footer containing additional meta information. This allows the file to
be resumed at arbitrary points if writing gets interrupted by power
loss.
Also adds a label op, which allows labeling future ops as connected.
If the footer is missing, Append will treat the last label as possibly
corrupt, and ignore it.
Change-Id: I126e15837d710776f9396e7afc9b0cd595e26b59
Bug: 168829493
Test: cow_api_test
This reverts commit 42c55f5ce9.
Reason for revert: b/171512004 It should be created at runtime.
Bug: 171512004
Change-Id: If9277f078cb343fbad825f0e8d1348d50f4b759a
This is to allow the tracing service to temporarily
lower kptr_restrict for the time it takes to build
its internal symbolization map (~200ms), only on
userdebug/eng builds.
kptr_restrict unfortunately cannot be lowered by
the tracing service itself. The main reason for that
is the fact that the kernel enforces a CAP_SYS_ADMIN
capability check at write() time, so the usual pattern
of opening the file in init and passing the FD to the
service won't work.
For more details see the design doc go/perfetto-kallsyms.
Bug: 136133013
Test: perfetto_integrationtests --gtest_filter=PerfettoTest.KernelAddressSymbolization in r.android.com/1454882
Change-Id: Ib2a8c69ed5348cc436223ff5e3eb8fd8df4ab860
As we change to a more resumable format, flush mostly writes the final
parts of the file that are needed, which would write extra data that is
not needed to continue writing, and would immediately be overwritten.
Additionally, in the next patch we will fsync the file after adding an
op, making the flush built in, and the Finalize name more appropriate.
Bug: 168829493
Test: builds
Change-Id: Iccc6580ac72ff066cfeeb32e3cdaf69c5ba615fc