For user who would like to retain the crash symptom and avoid device
from power cycle for live debugging, set
init.svc_debug.no_fatal.<svc_name> to "true" to skip FATAL reboot.
Bug: 177593855
Change-Id: I0bdb6191e5963c08e1ea301a60060acf916dd49b
There are sysfs nodes that don't take multiple inputs, adding a new
copy_per_line built-in command to copy from source file to destination
line by line.
Bug: 171740453
Test: boot and check file and log
Change-Id: I41b7a565829299d56b81d4509525dfa6a0a52444
The critical services can now using the interface `critical
[window=<fatal crash window mins>] [target=<fatal reboot target>]` to
setup the timing window that when there are more than 4 crashes in it,
the init will regard it as a fatal system error and reboot the system.
Config `window=${zygote.critical_window.minute:-off}' and
`target=zygote-fatal' for all system-server services, so platform that
configures ro.boot.zygote_critical_window can escape the system-server
crash-loop via init fatal handler.
Bug: 146818493
Change-Id: Ib2dc253616be6935ab9ab52184a1b6394665e813
The README.md states that this ordering is not guaranteed to give
flexibility for the future, however it's time to state that this
ordering is guaranteed, especially since:
1) We have a tests, EventTriggerOrder and
EventTriggerOrderMultipleFiles, which have guaranteed this ordering
since 2017.
2) We have users requesting and depending on this order
Also update some slightly out of date parts of the documentation:
1) We import /system/etc/init/hw/init.rc instead of /init.rc as the
first import
2) We additionally import /system_ext/etc/init and /product/etc/init
Test: n/a
Change-Id: I6d7b8d9e52f0d52bee320d5074ebb74a537f9150
There is documentation for how AIDL works with ctl commands or
interface_start commands, however it seems we were missing documentation
on declaration of interfaces.
Bug: N/A
Test: N/A
Change-Id: I0e5d2350b6b847a870eafbc69828e75f1f6ca4f0
While mount_all and umount_all were updated to use ro.boot.fstab_suffix,
I neglected to update swapon_all. Trivially copied from umount_all.
Bug: 142424832
Change-Id: Icd706fe7a1fe16c687cd2811b0a3158d7d2e224e
Merged-In: Icd706fe7a1fe16c687cd2811b0a3158d7d2e224e
The mount_all and swapon_all commands are documented, but umount_all
is not. Add some documentation.
Bug: 142424832
Change-Id: I7e4dcb4d222b787350a79c9e312062cac9eeb4d8
Currently the ReadDefaultFstab function, which calls GetFstabPath,
makes some assumptions about what the fstab will be called and where
it is located. This is being used by vold to set up userdata encryption
and for gsid, and is even used in the default boot control HAL, so it
has become quite baked.
The original way for a board to specify things to mount was to use the
"mount_all /path/to/fstab" command in init.rc. However, due to the
above functionality, the path after mount_all is no longer very useful,
as it cannot differ from the inferred path, or userdata encryption and
other features will be broken.
On Cuttlefish, we have an interest in being able to test alternative
userdata configurations (ext4 vs f2fs, encryption on/off, etc.) and
currently the only way to achieve this is to either a) modify the
ro.hardware or ro.hardware.platform properties, which breaks a bunch
of things like default HAL filenames, or regenerate our odm.img or
vendor.img filesystems. We can't simply install another fstab and
point to it with "mount_all".
This change allows the fstab path to be omitted from "mount_all", and
adds another property which overrides the existing checks for
fstab.${ro.hardware} and fstab.${ro.hardware.platform}. Specifying
${ro.boot.fstab_suffix} will cause fstab.${ro.boot.fstab_suffix}
to be checked first.
Bug: 142424832
Test: booted cuttlefish with 'mount_all ${ro.hardware} --late'
Test: booted cuttlefish with 'mount_all --late'
Test: booted cuttlefish with 'mount_all --late' and fstab_suffix=f2fs
Test: partially booted cuttlefish with 'mount_all ${ro.hardware}'
Test: partially booted cuttlefish with 'mount_all'
Change-Id: I3e10f66aecfcd48bdb9ebf1d304b7aae745cbd3c
A one second timeout is so coarse and can affect boot time when
the possibility that the file does not exist. Switch to accepting
a floating point number for seconds for the wait for file command.
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 151950334
Test: wait_for_file sleep 0.05 reports an appropriate delay
Change-Id: I8d8ed386519ab54270b05ce91663d0add30f12e7
Introduce new command to allow setting task profiles from inside .rc
script. This is to replace usage of writepid when a service is trying
to join a cgroup. Usage example from a .rc file:
service surfaceflinger /system/bin/surfaceflinger
task_profiles HighPerformance
Bug: 155419956
Test: change .rc file and confirm task profile is applied
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I0add9c3b363a7cb1ea89778780896cae1c8a303c
Some services are lazy HALs on some platforms and not lazy HALs on
others; this is known at runtime by hwservicemanager, so this change
adds these properties to allow hwservicemanager to turn one oneshot
(for lazy HALs). It may also be required to make a lazy HAL not lazy
anymore, and oneshot_off is provided for this.
Bug: 147841742
Test: new unit test that turn on and off oneshot on a service (bootanim)
and observes that it follows the expected behavior
Change-Id: I79524e2c9a5008f90c8d3bc40920fde00602a439
FscryptSetDirectoryPolicy no longer tries to infer the action from the
filename. Well mostly; it still assumes top-level directories in /data
should be encrypted unless the mkdir arguments say otherwise, but
it warns.
Bug: 26641735
Test: boot, check log messages
Change-Id: Id6d2cea7fb856f17323897d85cf6190c981b443c
Some services are not native android services and therefore don't log
via the normal mechanisms. This gives developers an option to have
their stdout/stderr logs sent directly to kmsg.
Test: see test prints to kernel log
Change-Id: I7973ea74d5cab3a90c2cd9a3d5de2266439d0c01
This replaces the recently added `exec_reboot_on_failure` builtin, since
it'll be cleaner to extend service definitions than extending `exec`.
This is in line with what we decided when adding `exec_start` instead
of extending `exec` to add parameters for priority.
Test: `exec_start` a service with a reboot_on_failure option and watch
the system reboot appropriately when the service is not found and when
the service terminates with a non-zero exit code.
Change-Id: I332bf9839fa94840d159a810c4a6ba2522189d0b
clang-tidy hinted that some of this code wasn't right. Looking
deeper, there is really not much related to file and socket
descriptors, except that they're published in similar ways to the
environment. All of the abstraction into a 'Descriptor' class takes
us further away from specifying what we really mean.
This removes that abstraction, adds stricter checks and better errors
for parsing init scripts, reports sockets and files that are unable to
be acquired before exec, and updates the README.md for the passcred
option.
Test: build, logd (uses files and sockets) works
Change-Id: I59e611e95c85bdbefa779ef69b32b9dd4ee203e2
Importing rc files during mount_all was at best a stop gap until
Treble's first stage mount and at worst a bad idea. It doesn't have a
reason to exist now that first stage mount exists and is required, and
always had edge cases where init could not handle loading some aspects
of scripts after it had started processing actions.
This change removes this functionality for devices launching after Q.
Test: devices boot
Change-Id: I3181289572968637b884e150d36651f453d40362
This keyword was introduced to support restarting services on devices
using APEX and FDE. The current implementation is not a restart, but
rather a 'reset' followed by a 'start', because the real /data must be
mounted in-between those two actions. But we effectively want this to be
a restart, which means that we also want to start 'disabled' services
that were running at the time we called 'class_reset_post_data'.
To implement this, keep track of whether a service was running when its
class was reset at post-data, and start all those services.
Bug: 132592548
Test: manual testing on FDE Taimen
Change-Id: I1e81e2c8e0ab2782150073d74e50e4cd734af7b9
Merged-In: I1e81e2c8e0ab2782150073d74e50e4cd734af7b9
Add a property ro.boottime.init.first_stage to provide us a
first stage init duration from start to exec completed in
nanoseconds.
For consistency, report nanoseconds duration for
ro.boottime.init.selinux as well instead of milliseconds.
Now also report consistently from start to exec completed
instead of just the selinux load time.
SideEffects: ro.boottime.init.selinux is reported to TRON and
may alarm with the millionfold increase in precision.
ro.boottime.init is now also consistent with ns
precision.
Test: inspect
Bug: 124491153
Bug: 129780532
Change-Id: Iff4f1a3a1ab7ff0a309c278724c92da0832b9a69
Before, if updatable processes crash 4 times in 4mins, a native
rollback will be attempted. This behavior does not detect
system_server early boot deadlocks because the system server requires
at least a min to detect a deadlock, and crash itself. The crashes
don't happen frequently enough for init to detect.
After, this cl, the old behavior exists and additionally, init detects
*any* 4 crashes of updatable processes before boot completed,
regardless of if they happen within 4mins or not.
Test: Manually tested by adding artificial sleep in system_server so
deadlock is triggered before boot. system_server crashes 4 times in
over 4mins and the ro.init.updatable_crashing prop is set to 1.
Bug: 129597207
Change-Id: Ie6fb5693ff4be105bcbe139c22850fb076e40260
On devices that use FDE and APEX at the same time, we need to bring up a
minimal framework to be able to mount the /data partition. During this
period, a tmpfs /data filesystem is created, which doesn't contain any
of the updated APEXEs. As a consequence, all those processes will be
using the APEXes from the /system partition.
This is obviously not desired, as APEXes in /system may be old and/or
contain security issues. Additionally, it would create a difference
between FBE and FDE devices at runtime.
Ideally, we restart all processes that have started after we created the
tmpfs /data. We can't (re)start based on class names alone, because some
classes (eg 'hal') contain services that are required to start apexd
itself and that shouldn't be killed (eg the graphics HAL).
To address this, keep track of which processes are started after /data
is mounted, with a new 'mark_post_data' keyword. Additionally, create
'class_reset_post_data', which resets all services in the class that
were created after the initial /data mount, and 'class_start_post_data',
which starts all services in the class that were started after /data was
mounted.
On a device with FBE, these keywords wouldn't be used; on a device with
FDE, we'd use them to bring down the right processes after the user has
entered the correct secret, and restart them.
Bug: 118485723
Test: manually verified process list
Change-Id: I16adb776dacf1dd1feeaff9e60639b99899905eb
The code to read verity state in userspace is deprecated in favor of
having the bootloader read and report the state, so this change
removes this now unused code.
Bug: 73456517
Test: boot
Change-Id: Ib626fd61850bce3016179ca92a9831c2ac29c032
Set properties dev.mnt.blk.<mount_point>=<device_block_class> for mount
and umount operations by setting up an Epoll handler to catch
EPOLLERR or EPOLLPRI signals when /proc/mounts is changed. Only
update properties associated with block devices. For the mount
point of /, use the designation of /root instead.
Can use the properties in init rc expansion like:
on property dev.mnt.blk.root=*
write /sys/block/${dev.mnt.blk.root}/queue/read_ahead_kb ${boot_read_ahead_kb:-2048}
on property dev.mnt.blk.data=*
write /sys/block/${dev.mnt.blk.data}/queue/read_ahead_kb ${boot_read_ahead_kb:-2048}
on late-fs
setprop boot_read_ahead_kb 128
write /sys/block/${dev.mnt.blk.root}/queue/read_ahead_kb ${boot_read_ahead_kb}
write /sys/block/${dev.mnt.blk.data}/queue/read_ahead_kb ${boot_read_ahead_kb}
Test: boot and inspect getprop results.
Bug: 124072565
Change-Id: I1b8aff44f922ba372cd926de2919c215c40ee874
In particular, this allows services running as the root user to have
capabilities removed instead of always having full capabilities.
Test: boot device with a root service with an empty capabilities
option in init showing no capabilities in /proc/<pid>/status
Change-Id: I569a5573ed4bc5fab0eb37ce9224ab708e980451
*/build.prop files are now loaded much earlier than before; from 'on
post-fs' to the time when the property service is started which is
before init starts the action loop.
This ensures that all processes that are launched by init have a
consistent view of system properties. Previously, the processes that
started before 'on post-fs' were initially with the small number of
sysprops loaded from */default.prop and then suddenly get additional
sysprops from */build.prop while they are executing.
Bug: 122714998
Test: device boots
Change-Id: Ic07528421dfbe8d4f43673cea41175d33cfbf298
With all of the changes made to the early init boot phase, the
README.md needs updating for future referencing.
Test: none
Change-Id: Ia572577c683add449a4e091ffd4d1597682e9325
A service with 'updatable' option can be overriden by the same service
definition in APEXes.
/system/etc/init/foo.rc:
service foo /system/bin/foo
updatable
/apex/myapex/etc/init.rc:
service foo /apex/myapex/bin/foo
override
Overriding a non-updatable (i.e. without updatable option) service
from APEXes is prohibited.
When an updatable service is started before APEXes are all activated,
the execution is delayed until when the APEXes are all activated.
Bug: 117403679
Test: m apex.test; adb push <built_apex> /data/apex; adb reboot
adb shell, then lsof -p $(pidof surfaceflinger) shows that
the process is executing
/apex/com.android.example.apex@1/bin/surfaceflinger instead of
/system/bin/surfaceflinger
Change-Id: I8a57b8e7f6da81b4d2843e261a9a935dd279067c
Init now parses *.rc files from the APEXs when the apexd notifies the
mount event via apexd.status sysprop.
Bug: 117403679
Test: m apex.test; adb root; adb push <builtfile> /data/apex; adb reboot
adb root; adb shell setprop ctl.start apex.test; dmesg shows that init
tries to start the service which doesn't exist.
[ 47.979657] init: Could not ctl.start for 'apex.test': Cannot find '/apex/com.android.example.apex/bin/test': No such file or directory
Change-Id: I3f12355346eeb212eca4de85b6b73257283fa054
The memcg.limit_percent option can be used to limit the cgroup's
max RSS to the given value as a percentage of the device's physical
memory. The memcg.limit_property option specifies the name of a
property that can be used to control the cgroup's max RSS. These
new options correspond to the arguments to the limitProcessMemory
function in frameworks/av/media/libmedia/MediaUtils.cpp; this will
allow us to add these options to the rc files for the programs that
call this function and then remove the callers in a later change.
There is also a change in semantics: the memcg.* options now have
an effect on all devices which support memory cgroups, not just
those with ro.config.low_ram or ro.config.per_app_memcg set to true.
This change also brings the semantics in line with the documentation,
so it looks like the previous semantics were unintentional.
Change-Id: I9495826de6e477b952e23866743b5fa600adcacb
Bug: 118642754
Bug: 117828597
Test: bugreport launches with a test property set to appropriate keys
Test: bugreport doesn't launch with the test property unset
Test: no errors seen in build or boot in either of the above cases
Change-Id: Iea27032080a0a7863932b1c1b573857ac66b56b5
'Critical' services have rebooted into bootloader, like all other
catastrophic init crashes, for years now. Update the text to match.
Test: n/a
Change-Id: Icfc41bf3e383958f14ecfaab9ca187e2c3dc7fd9
This keyword can (and should) be used multiple times when multiple
services are served together. I've documented this here.
Bug: N/A
Test: N/A
Change-Id: Ie986c9cac486db346555f359e9ccbed93d8d1d22
Allow services to specify a custom restart period via the
restart_period service option. This will allow services to be run
periodically, such as a service that needs to run every hour.
Allow services to specify a timeout period via the timeout_period
service option. This will allow services to be killed after the
timeout expires if they are still running. This can be combined with
restart_period for creating period services.
Test: test app restarts every minute
Change-Id: Iad017820f9a602f9826104fb8cafc91bfb4b28d6
Due to a bug with ParseUint(), init would defacto accept -1 for an
infinite rlimit, but only on 64bit devices. That bug is now fixed,
such that -1 would be rejected by ParseUint() for all devices.
This change explicitly checks for -1 for all devices or 'unlimited' to
match ulimit's reporting and accepts either as an infinite rlimit.
Bug: 112668205
Test: new (and old) unit tests
Change-Id: Ie28ff622cdf375a65ceb5f32ffb14fb3d5d9f2ba
Add the ability to enter a network namespace when launching a service.
Typical usage of this would be something similar to the below:
on fs
exec ip netns add namespace_name
service vendor_something /vendor/...
capabilities <lower than root>
user not_root
enter_namespace net /mnt/.../namespace_name
Note changes to the `ip` tool are needed to create the namespace in
the correct directory.
Bug: 73334854
Test: not yet
Change-Id: Ifa91c873d36d69db399bb9c04ff2362518a0b07d