Store pertinent information about userspace reboot events in the case
of failure. This information is any services which failed to stop
cleanly, the output of the default fstab and /proc/mounts, and
a list of mounts which failed to unmount. This information is only
stored as necessary (i.e. mount information will not be stored if
everything unmounted, even if some services failed to stop).
Added new /metadata/userspacereboot directory to persist this
information. Information older than 3 days will be deleted.
Test: adb reboot userspace with sigterm/sigkill timeouts set to
very low values
Test: Manual test of storing all other information
Bug: 151820675
Change-Id: I6cfbfae92a7fc6f6c984475cad2c50c559924866
Having mounted apexes with loop back devices backing files on /data
partition will prevent clean unmount of it. Unmounting them and tearing
down loop devices should minimize the risk of that.
Note that it won't fix the issue completely, as there are a few (~2-3)
processes that keep restarting even after SIGKILL is sent. Which means
that they can still hold references to apexes on /data partition. But
in practice probability of this is quite low.
Test: adb reboot
Test: put tzdata apex in /data/apex/active && adb reboot
Bug: 158152940
Change-Id: I4624567b3d0f304dba4c6e37b77abd89e57411de
Init starts ueventd in the default mount namespace to support loading
firmware from APEXes.
Bug: 155023652
Test: devices boots
adb$ nsenter -t (pid of ueventd) -m ls /apex
=> shows all APEXes
Change-Id: Ibb8b33a07eb014752275e3bca4541b8b694dc64b
To ensure we can shutdown cleanly, and don't hang an outstanding
requests to a FUSE host daemon that has already exited.
Bug: 153411204
Test: inspect logs during shutdown
Change-Id: I8e6479bd54dbc1fc85b087617aa6b16be9f15a3b
The exit of init panics the system *after* process context (mm, stack,
...etc.) are recycled, according to Linux kernel's 'do_exit'
implementation. To preserve most init process context for debugging,
triggers the panic via proc-sysrq explicitly.
Note: after this change, there will be no "Attempt to kill init" panic
when androidboot.init_fatal_panic is set.
Test: Insert data abort fault in init, the full process context is
preserved in memory dump captured after panic.
Bug: 155940351
Change-Id: I3393bd00f99b8cb432cfa19a105b7d636b411764
(cherry picked from commit be1cf9006a)
The exit of init panics the system *after* process context (mm, stack,
...etc.) are recycled, according to Linux kernel's 'do_exit'
implementation. To preserve most init process context for debugging,
triggers the panic via proc-sysrq explicitly.
Note: after this change, there will be no "Attempt to kill init" panic
when androidboot.init_fatal_panic is set.
Test: Insert data abort fault in init, the full process context is
preserved in memory dump captured after panic.
Bug: 155940351
Change-Id: I3393bd00f99b8cb432cfa19a105b7d636b411764
Since this function is used in userspace reboot, we need to be more
diligent with error handling, e.g.:
* If init fails to read /sys/block/zram0/backing_dev, then fail and
fallback to hard reboot.
* Always call swapoff.
* Always reset zram.
* Tear down loop device only if zram is backed by a loop device.
Test: adb reboot userspace
Bug: 153917129
Change-Id: I4709da1d08cf427ad9c898cfb2506b6a29f1d680
Merged-In: I4709da1d08cf427ad9c898cfb2506b6a29f1d680
(cherry picked from commit a840d405eb)
Since this function is used in userspace reboot, we need to be more
diligent with error handling, e.g.:
* If init fails to read /sys/block/zram0/backing_dev, then fail and
fallback to hard reboot.
* Always call swapoff.
* Always reset zram.
* Tear down loop device only if zram is backed by a loop device.
Test: adb reboot userspace
Bug: 153917129
Change-Id: I4709da1d08cf427ad9c898cfb2506b6a29f1d680
Similarly to other recovery mechanisms, timeout is controlled by a
read-only property that can be configured per-device.
Test: adb root
Test: adb shell setprop init.userspace_reboot.started.timeoutmillis 2
Test: adb reboot userspace
Bug: 152803929
Change-Id: Id70710b46da798945ac5422ef7d69265911ea5ef
Merged-In: Id70710b46da798945ac5422ef7d69265911ea5ef
(cherry picked from commit d05535485f)
Similarly to other recovery mechanisms, timeout is controlled by a
read-only property that can be configured per-device.
Test: adb root
Test: adb shell setprop init.userspace_reboot.started.timeoutmillis 2
Test: adb reboot userspace
Bug: 152803929
Change-Id: Id70710b46da798945ac5422ef7d69265911ea5ef
Devices in the lab are hitting an issue where they're getting stuck
likely in the sync() call in DoReboot() before we start the reboot
monitor thread and before we shut down services.
It's possible that concurrent writing to RW file systems is causing
this sync() call to take essentially forever. To protect against
this, we need to remove this sync(). Note that we will still call
sync() after shutting down services.
Note that the service shutdown code has a timeout and there is a
reboot monitor thread that will shutdown the device if more than 30
seconds pass above that timeout. This change increases that timeout
to 300 seconds to give the final sync() calls explicitly more time to
finish.
Bug: 150863651
Test: reboot functions normally
Test: put an infinite loop in DoReboot and the the reboot monitor thread
triggers and shuts down the device appropriately
Merged-In: I6fd7d3a25d3225081388e39a14c9fdab21b592ba
Change-Id: I6fd7d3a25d3225081388e39a14c9fdab21b592ba
(cherry picked from commit 10615eb397)
Devices in the lab are hitting an issue where they're getting stuck
likely in the sync() call in DoReboot() before we start the reboot
monitor thread and before we shut down services.
It's possible that concurrent writing to RW file systems is causing
this sync() call to take essentially forever. To protect against
this, we need to remove this sync(). Note that we will still call
sync() after shutting down services.
Note that the service shutdown code has a timeout and there is a
reboot monitor thread that will shutdown the device if more than 30
seconds pass above that timeout. This change increases that timeout
to 300 seconds to give the final sync() calls explicitly more time to
finish.
Bug: 150863651
Test: reboot functions normally
Test: put an infinite loop in DoReboot and the the reboot monitor thread
triggers and shuts down the device appropriately
Change-Id: I6fd7d3a25d3225081388e39a14c9fdab21b592ba
A previous change moved property_service into its own thread, since
there was otherwise a deadlock whenever a process called by init would
try to set a property. This new thread, however, would send a message
via a blocking socket to init for each property that it received,
since init may need to take action depending on which property it is.
Unfortunately, this means that the deadlock is still possible, the
only difference is the socket's buffer must be filled before init deadlocks.
This change, therefore, adds the following:
1) A lock for instructing init to reboot
2) A lock for waiting on properties
3) A lock for queueing new properties
A previous version of this change was reverted and added locks around
all service operations and allowed the property thread to spawn
services directly. This was complex due to the fact that this code
was not designed to be multi-threaded. It was reverted due to
apparent issues during reboot. This change keeps a queue of processes
pending control messages, which it will then handle in the future. It
is less flexible but safer.
Bug: 146877356
Bug: 148236233
Bug: 150863651
Bug: 151251827
Test: multiple reboot tests, safely restarting hwservicemanager
Merged-In: Ice773436e85d3bf636bb0a892f3f6002bdf996b6
Change-Id: Ice773436e85d3bf636bb0a892f3f6002bdf996b6
(cherry picked from commit 802864c782)
This is apparently causing problems with reboot.
This reverts commit d2dab830d3.
Bug: 150863651
Test: build
Merged-In: Ib8a4835cdc8358a54c7acdebc5c95038963a0419
Change-Id: Ib8a4835cdc8358a54c7acdebc5c95038963a0419
A previous change moved property_service into its own thread, since
there was otherwise a deadlock whenever a process called by init would
try to set a property. This new thread, however, would send a message
via a blocking socket to init for each property that it received,
since init may need to take action depending on which property it is.
Unfortunately, this means that the deadlock is still possible, the
only difference is the socket's buffer must be filled before init deadlocks.
This change, therefore, adds the following:
1) A lock for instructing init to reboot
2) A lock for waiting on properties
3) A lock for queueing new properties
A previous version of this change was reverted and added locks around
all service operations and allowed the property thread to spawn
services directly. This was complex due to the fact that this code
was not designed to be multi-threaded. It was reverted due to
apparent issues during reboot. This change keeps a queue of processes
pending control messages, which it will then handle in the future. It
is less flexible but safer.
Bug: 146877356
Bug: 148236233
Bug: 150863651
Bug: 151251827
Test: multiple reboot tests, safely restarting hwservicemanager
Change-Id: Ice773436e85d3bf636bb0a892f3f6002bdf996b6
This is apparently causing problems with reboot.
This reverts commit 7205c62933.
Bug: 150863651
Test: build
Change-Id: Ib8a4835cdc8358a54c7acdebc5c95038963a0419
A previous change moved property_service into its own thread, since
there was otherwise a deadlock whenever a process called by init would
try to set a property. This new thread, however, would send a message
via a blocking socket to init for each property that it received,
since init may need to take action depending on which property it is.
Unfortunately, this means that the deadlock is still possible, the
only difference is the socket's buffer must be filled before init deadlocks.
There are possible partial solutions here: the socket's buffer may be
increased or property_service may only send messages for the
properties that init will take action on, however all of these
solutions still lead to eventual deadlock. The only complete solution
is to handle these messages asynchronously.
This change, therefore, adds the following:
1) A lock for instructing init to reboot
2) A lock for waiting on properties
3) A lock for queueing new properties
4) A lock for any actions with ServiceList or any Services, enforced
through thread annotations, particularly since this code was not
designed with the intention of being multi-threaded.
Bug: 146877356
Bug: 148236233
Test: boot
Test: kill hwservicemanager without deadlock
Merged-In: I84108e54217866205a48c45e8b59355012c32ea8
Change-Id: I84108e54217866205a48c45e8b59355012c32ea8
(cherry picked from commit 7205c62933)
If init is wedged, then the write will never succeed and reboot won't
happen.
Also, in case of normal reboot, move call to PersistRebootReason to the
top of DoReboot() function, to make sure we persist it even if /data is
not mounted.
Test: builds
Test: adb shell svc power reboot userspace
Test: atest CtsUserspaceRebootHostSideTestCases
Bug: 148767783
Change-Id: I4ae40e1f6fdc41cc0bcae57020fa3d3385dda1b4
Merged-In: I4ae40e1f6fdc41cc0bcae57020fa3d3385dda1b4
If init is wedged, then the write will never succeed and reboot won't
happen.
Also, in case of normal reboot, move call to PersistRebootReason to the
top of DoReboot() function, to make sure we persist it even if /data is
not mounted.
Test: builds
Test: adb shell svc power reboot userspace
Test: atest CtsUserspaceRebootHostSideTestCases
Bug: 148767783
Change-Id: I4ae40e1f6fdc41cc0bcae57020fa3d3385dda1b4
A previous change moved property_service into its own thread, since
there was otherwise a deadlock whenever a process called by init would
try to set a property. This new thread, however, would send a message
via a blocking socket to init for each property that it received,
since init may need to take action depending on which property it is.
Unfortunately, this means that the deadlock is still possible, the
only difference is the socket's buffer must be filled before init deadlocks.
There are possible partial solutions here: the socket's buffer may be
increased or property_service may only send messages for the
properties that init will take action on, however all of these
solutions still lead to eventual deadlock. The only complete solution
is to handle these messages asynchronously.
This change, therefore, adds the following:
1) A lock for instructing init to reboot
2) A lock for waiting on properties
3) A lock for queueing new properties
4) A lock for any actions with ServiceList or any Services, enforced
through thread annotations, particularly since this code was not
designed with the intention of being multi-threaded.
Bug: 146877356
Bug: 148236233
Test: boot
Test: kill hwservicemanager without deadlock
Change-Id: I84108e54217866205a48c45e8b59355012c32ea8
Helps with support of recovery and rollback boot reason history, by
also using /metadata/bootstat/persist.sys.boot.reason to file the
reboot reason.
Test: manual
Bug: 129007837
Change-Id: Id1d21c404067414847bef14a0c43f70cafe1a3e2
Instead they will be logged from system_server. This CL just prepares
grounds for logging CL to land.
Test: adb reboot userspace
Bug: 148767783
Change-Id: Ie9482ef735344ecfb0de8a37785d314a3c0417ff
Also reset some more properties to make bootanimation work properly.
Test: adb reboot userspace
Bug: 148172262
Change-Id: I0154d4fe9377c019150f5b1a709c406925db584d
When BCB (bootloader message structure inside of misc partition) is
malformed (contains some non-printable characters in its fields),
"reboot recovery" command won't be able to write required string to
"command" field. It can happen for example when partition table was
created anew and 'misc' partition area contains some garbage. Also this
behavior can be emulated with this command:
$ fastboot erase misc
which leads to 'misc' partition to be filled with 0xFF characters. Hence
this code:
if (boot.command[0] == '\0') {
won't let us to set new string to "command" field. Let's check if
"command" field is malformed and fix it, before actually checking for
previously set content.
"fastboot erase" shouldn't be used for testing purposes though, as it
doesn't work sometimes due to alignment, on bootloader side:
Erasing blocks 6144 to 6144 due to alignment
........ erased 0 bytes from 'misc'
Instead one might use "dd" command to fill 'misc' with 0xFF's:
$ dd if=/dev/zero ibs=2k count=1 | tr "\000" "\377" >misc.img
$ fastboot flash misc misc.img
Test: Fill 'misc' partition with 0xFF's, then do "adb reboot recovery"
Change-Id: Ica8ca31012b9b2249645e7305830c07a20dd013c
Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Since I was there, added two more properties to reset, and switched
ordering of sys.init.updatable_crashing and
sys.init.updatable_crashing_process_name setprops to make sure that
process name is already set when apexd/PackageWatchdog get's notified
about sys.init.updatable_crashing.
Also fixed a typo in what HandleUserspaceReboot function.
Test: adb reboot userspace
Bug: 135984674
Change-Id: I954ec49aae0734cda1bd833ad68f386ecd808f73
Such services will be re-parsed and added back to the service list
during post-fs-data stage.
Test: adb reboot userspace
Test: atest CtsInitTestCases
Bug: 145669993
Bug: 135984674
Change-Id: Ibb393dfe0f101c4ebe37bc763733fd5d981d3691
Init is no longer a special case and talks to property service just
like every other client, therefore move it away from property_set()
and to android::base::SetProperty().
In doing so, this change moves the initial property set up from the
kernel command line and property files directly into PropertyInit().
This makes the responsibilities between init and property services
more clear.
Test: boot, unit test cases
Change-Id: I36b8c83e845d887f1b203355c2391ec123c3d05f
sys.init.userspace_reboot.in_progress will be used to notify all
the processes (including vendor ones) that userspace reboot is
happening, hence it should be treated as stable public api.
All other sys.init.userspace_reboot.* props will be internal to /system
partition and don't require any stability guarantees.
Test: builds
Test: adb reboot userspace
Bug: 135984674
Change-Id: Ifb64a6bfae2de76bac67edea68df44e33c9cfe2d