We need to make progress both on adding the real interface for battery
level and cleaning up logging. This stands in the way of both.
Bug: http://b/77725702
Test: builds
Change-Id: Ia457e497606c2c7965d6895baebb26eef17857c9
Merged-In: Ia457e497606c2c7965d6895baebb26eef17857c9
To ensure a surprise reboot does not take the last boot reason on
face value especially if coming from more than one boot sessions ago.
We shift and clear the value from persist.sys.boot.reason to
sys.boot.reason.last and establish a correct last reboot reason in
the canonical sys.boot.reason property.
This effectively deprecates persist.sys.boot.reason as an API. They
should have been using sys.boot.reason instead for a correctly
determined reasoning.
Test: boot_reason_test.sh
Bug: 86671991
Merged-In: If85750704445088fd62978679ab3a30744c46abb
Change-Id: If85750704445088fd62978679ab3a30744c46abb
This change removes the CAP_SYSLOG file based capability from bootstat,
since the intention is that it should not be accessing the logs in the
long term. In order to avoid bitrot, the fallback code that depends on
CAP_SYSLOG has also been removed.
Bug: 62845925
Test: system/core/bootstat/boot_reason_test.sh
Change-Id: I899be44ef3ac1c4d81072f801d55c928ae09bb15
(partial cherry pick from commit fe3e762b6d)
Adding the boot sequence reported atom in ag/3518079 caused the duration
of bootstat to increase, as seen in b/72864061. I isolated the cause
down to calling BootReasonStrToReason. However, this function also gets
called in ReportBootReason, so I created another function that does the
parsing and sets the system boot reason property, and made
RecordBootReason and statsd logging get that property.
Bug: 72864061
Test: rebooted phone, verified boot events were received in adb shell
logcat -b stats and verified adb shell bootstat -p printed correct
values. Ran timing tests as well on walleye with 20 boots: before this
change, the average was ~150-160ms. After, it was ~80ms.
Change-Id: I92dbc9880328835647be7d9d50c7861b42f27bdb
Merged-In: I92dbc9880328835647be7d9d50c7861b42f27bdb
Received some clarity as to some of the boot reasons.
List of boot reasons and new translations to Canonical boot reason:
- "power_key" -> "cold,powerkey" (existing)
- "usb" -> "cold,charger" (existing)
- "rtc" -> "cold,rtc" (existing)
- "wdt" -> "reboot" (changed)
- "wdt_by_pass_pwk" -> "warm" (changed)
- "tool_by_pass_pwd" -> "reboot,tool" (changed)
- "2sec_reboot" -> "cold,rtc,2sec" (changed)
- "unknown" -> "reboot,unknown" (existing)
- "kernel_panic" (existing)
- "reboot" (existing)
- "watchdog" (existing)
Add the new string to enums for the new Boot Reason.
Test: boot_reason_test.sh (on affected device)
Bug: 74595769
Bug: 63736262
Change-Id: Iecedc3b1f7c47f26d0c77b1f316f745c6d2c1256
Some devices report the following canonical boot reason for all
shutdown operations:
reboot,kernel_power_off_charging__reboot_system
because shutdown switches to a charging kernel, and reboots into the
system when the user presses the power button. Thus last kernel
messages arrives as:
<0>.(0)[53:pmic_thread]reboot: Restarting system with command \
'kernel power off charging reboot system'
-> "shutdown" (w/o last boot reason)
-> "shutdown,<subreason>" (w/last boot reason)
The reboot reason from that charging instance propagates as a
fortified boot reason blocking interpretation of the last boot reason
that manages shutdown canonical boot reason determination. The fix
is to change reboot,kernel_power_off_charging__reboot_system to
shutdown, so that it is viewed as a blunt reason that can be
overridden by last boot reason.
We added the above boot reason to kBootReasonMap because the Bit
Error Handler can use it to reconstruct if there is any damage to
the last kernel messages content. The sad thing is that the enum
will never propagate as we are filtering it out and reporting
"shutdown" instead. Of course, we are now covered for a can not
happen.
Test: boot_reason_test.sh
Bug: 74595769
Bug: 63736262
Change-Id: I28987f0871af7d967cc4bbbffed43bd42349acdd
Found a kernel modem driver report:
Kernel panic - not syncing: subsys-restart: Resetting the SoC - modem crashed.
Which translates to the canonical boot reason, a wordy:
kernel_panic,subsys-restart:_resetting_the_soc_-_modem_crashed.
Shortening and ber matching the string, plus others that are possible,
to be more succinct, so added kernel_panic,{modem|adsp|dsps|wcnss}.
Test: build
Bug: 80553005
Change-Id: I969e1da896cd15b82e2fe11ceb77a5f54dfcfbc8
This test fails on most devices, gives us a report of devices that
do not propagate the boot reason via the bootloader. This should
become a bootloader required test.
Test: boot_reason_test.sh optional_rescueparty
Change-Id: Ibdc7b23b025e5d89d659ff08083b2e071343923c
Report kernel_panic,sysrq,livelock,<state> reboot reason via last
dmesg (pstore console). Add ro.llk.killtest property, which will
allow reliable ABA platforms to drop kill test and go directly
to kernel panic. This should also allow some manual unit testing
of the canonical boot reason report.
New canonical boot reasons from llkd are:
- kernel_panic,sysrq,livelock,alarm llkd itself locked up (Hail Mary)
- kernel_panic,sysrq,livelock,driver uninterrruptible D state
- kernel_panic,sysrq,livelock,zombie uninterrruptible Z state
Manual test assumptions:
- llkd is built by the platform and landed on system partition
- unit test is built and landed in /data/nativetest (could
land in /data/nativetest64, adjust test correspondingly)
- llkd not enabled, ro.llk.enable and ro.llk.killtest
are not set by platform allowing test to adjust all the
configuration properties and start llkd.
- or, llkd is enabled, ro.llk.enable is true, and killtest is
disabled, ro.llk.killtest is false, setup by the platform.
This breaks the go/apct generic operations of the unit test
for llk.zombie and llk.driver as kernel panic results
requiring manual intervention otherwise. If test moves to
go/apct, then we will be forced to bypass these tests under
this condition (but allow them to run if ro.llk.killtest
is "off" so specific testing above/below can be run).
for i in driver zombie; do
adb shell su root setprop ro.llk.killtest off
adb shell /data/nativetest/llkd_unit_test/llkd_unit_test --gtest_filter=llkd.${i}
adb wait-for-device
adb shell su root setprop ro.llk.killtest off
sleep 60
adb shell getprop sys.boot.reason
adb shell /data/nativetest/llkd_unit_test/llkd_unit_test --gtest_filter=llkd.${i}
done
Test: llkd_unit_test (see test assumptions)
Bug: 33808187
Bug: 72838192
Change-Id: I2b24875376ddfdbc282ba3da5c5b3567de85dbc0
This change allows bootstat to read the ro.boot.boottime_offset
property, which is set on devices where Android runs in a container.
This is because the CLOCK_BOOTTIME clock is not reset when (from
Android's perspective) the device restarts, since the host OS does not
restart itself.
Bug: 77273909
Test: CtsBootStatsTestCases
Change-Id: Ifb792864394be0b4686cc9d555c0afced856f4b4
Merged-In: Ifb792864394be0b4686cc9d555c0afced856f4b4
This change allows bootstat to read the ro.boot.boottime_offset
property, which is set on devices where Android runs in a container.
This is because the CLOCK_BOOTTIME clock is not reset when (from
Android's perspective) the device restarts, since the host OS does not
restart itself.
Bug: 77273909
Test: CtsBootStatsTestCases
Change-Id: Ifb792864394be0b4686cc9d555c0afced856f4b4
Provide some easy kernel panic subreasons mined from last kmesg,
generates a canonical boot reason (system boot reason) that
may aid triage.
Notably report kernel_panic,hung if [khungtaskd] triggers on
a livelock condition, forms a signals on the dashboards.
Helper function getSubreason modified to optionally enable checking
for a single quote resulting in a refactoring to ease maintenance of
the termination detection in the face of single bit errors heuristics.
Test: boot_reason_test.sh
Bug: 63736262
Bug: 33808187
Change-Id: I7fdd1e57e7a26095738175074306f0d2d59b1d69
Allow for a daemon to write to last kmsg to propagate a detailed
subreason to kernel_panic,sysrq actions. A minor refactor moves
common code into a helper function getSubreason for retrieval and
bit error correction operations.
A sysrq crash generally produces a kernel-provided message:
SysRq : Trigger a crash
which is used to generate a canonical boot reason kernel_panic,sysrq.
A user daemon could write to /dev/kmsg just prior to the sysrq with
SysRq : Trigger a crash : '<subreason>'
to change the canonical boot reason to kernel_panic,sysrq,<subreason>.
Administration added pending kBootReasonMap entries present in TRON.
Test: manual echo into /dev/kmsg and /proc/sysrq-trigger and check
Test: boot_reason_test.sh
Bug: 33808187
Bug: 63736262
Change-Id: Ibf5432737e5a3449ebe40a8c6cf2d3e912ed6bbc
Hopefully the quick property test is first, setting the stage for
ignoring future failures. This solves a problem with multiple
test failures directly attributed to a CTS compliance issue
with the bootloader's boot reason. One test fails, the remainder
get a pass on this one issue.
Test: boot_reason_test.sh
Bug: 63736262
Change-Id: Id56946b6f2f3a33d65bd1830543838f153290759
For aliasReasons allow one to optionally suppress needle for output
member using a <bang> (!) character prefix.
Test: boot_reason_test.sh
Bug: 63736262
Bug: 74595769
Change-Id: If35518c08cf909c6c78a16275e9d8dfd0ff839a9
For all known cases, if usb is present in the bootloader reason,
then it is actually reporting a cold,charger canonical boot reason.
This signifies that the device was powered down, and was woken
up by the charger being connected.
For all known cases, if rtc is present in the bootloader reason,
then it is actually reporting a cold,rtc canonical boot reason.
This signifies that the device was powered down, and was woken
up by the rtc clock.
Test: boot_reason_test.sh
Bug: 74595769
Bug: 63736262
Change-Id: I04bede0b7ccaa1b859943f7def93521a8f7b25c6
Add support for regex in aliasReasons for the alias member. Use this
new feature to check powerkey|power_key|PowerKey for a single entry.
Test: boot_reason_test.sh
Bug: 63736262
Change-Id: Ia6add99b9e33f3197643dbaab88dde20aa726f90
When we are matching existing known boot reasons, we should try with
compliant underlines first, then again with underlines replaced with
spaces. Replace references to Ber with BitError for maintenance
clarity. Replace helper functions with C++11 lambdas.
Test: boot_reason_test.sh
Bug: 63736262
Change-Id: I91b865013513526a55a85523080c7127f198968c
Two entries can be reused. The third "unknown" entry is not really
a duplicate since the kUnknownBootReason is not checked. Duplicate
entries reused in the future, should have
analysis/uma/configs/clearcut/TRON/histograms.xml updated first.
Test: boot_reason_test.sh
Change-Id: If2071a18160dc2c93e851fecc6b8c11fc76c9845
Use an alternate means to determine that the sysrq crash was
requested. Also, to be CTS compliant, the kernel_panic subreason
must be in lower case.
Test: boot_reason_test.sh
Bug: 74595769
Bug: 63736262
Change-Id: Ica06960ce62d220a909006e365951376d672b7e6
This commit disables "bootstat" in PDK builds because "bootstat" depends
on "libstatslog" (from "frameworks/base") which is not included in PDK
builds as well.
Test: Build a target (described in http://b/72961456) with
`platform.zip` built from master FSK source tree.
Bug: 72961456
Change-Id: I06b1555694510e17ea82d5c6dfcdeaf99b905e4d
Adding the boot sequence reported atom in ag/3518079 caused the duration
of bootstat to increase, as seen in b/72864061. I isolated the cause
down to calling BootReasonStrToReason. However, this function also gets
called in ReportBootReason, so I created another function that does the
parsing and sets the system boot reason property, and made
RecordBootReason and statsd logging get that property.
Bug: 72864061
Test: rebooted phone, verified boot events were received in adb shell
logcat -b stats and verified adb shell bootstat -p printed correct
values. Ran timing tests as well on walleye with 20 boots: before this
change, the average was ~150-160ms. After, it was ~80ms.
Change-Id: I92dbc9880328835647be7d9d50c7861b42f27bdb
Logs information about boot time and reason to statsd.
Specifically: bootloader boot reason, system boot reason, bootloader
boot time, total boot time, time that the boot finished, and time since
last boot.
Test: booted the phone and verified "adb logcat -b stats" received the
event
Change-Id: I769df9a09263ed3667f7085c81b3d072e868cbda
Integer overflow sanitized builds are throwing an error on the while
loop decrement in the rfind function. This refactors the loop to prevent
decrementing the value on the final iteration.
Test: Compiled and device boots without runtime error.
Bug: 30969751
Change-Id: Ice4532cce933062b3c14adf2d9749cfdea4ad84c
Merged-In: Ice4532cce933062b3c14adf2d9749cfdea4ad84c
We should have done this from the beginning. Thanks to Windows, we're not
going to be able to switch libbase over to std::string_view any time soon.
Bug: N/A
Test: ran tests
Change-Id: Iff2f56986e39de53f3ac484415378af17dacf26b
If the platform has no bootloader or pstore support, kernel_panic
test should fail if the results are not correct. Drop skipping of
failed test if pstore support is lacking.
If device demonstrably has pstore content support, the result must
be exacting kernel_panic,sysrq. Otherwise accept the less precise
result.
Test: On hikey960 (which currently lacks reliable pstore, or a
compliant bootloader reporting bootreason), expect failure of:
system/core/bootstat/boot_reason_test.sh kernel_panic
Bug: 63736262
Change-Id: I071a2a9c00dc522ec037c8a8997fea524d17e6e4
Create a private rfind that allows a fuzzy match based on a bit error
rate (BER) of 1 every 8 bits. last kmsg is affected by pstore ramoops
backing that suffers from data corruption. Add some additional
validation based on possible data corruption scenarios, as a noisy
match means higher chance of noisy data.
Noisy data notably can affect the battery level detection, but do not
typically result in false positives. Battery level, or failure, is
the responsibility of the BatteryStats service, providing a positive
signal and strong device-independent algorithm. The checking done in
bootstat is likely to be deprecated in favour of an API request to
BatteryStats once their algorithms deal with surprise outages due to
aging.
The kernel logging heuristic and BER fixup handily deals with a
prevalent issue where some bootloaders failure to properly notify us
of panics. This is where the gains are noticed with this improvement.
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: I93b4210f12fb47c5c036f4d6eb4cafeee4896d35
Replace simple strtoull with loop that ensures no leading zeros.
Restrict size of value buffer being checked as allocation was
going to end of retrieved buffer, which can cause unnecessary
memory pressure during boot.
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: Ifdc1d4fd3a73794c001577024ce7cbfde9c25028
If we sniff an unknown boot reason from last kmsg, make sure it
has a "reboot," prefix.
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: Ia1c401b8899d1f0c56bd4f5d8d2d19b7fc889a30
Add the test injection to known list, and deal with an error
propagation issue.
Test: system/core/bootstat/boot_reason_test.sh Its_Just_So_Hard_reboot
Bug: 63736262
Change-Id: I4799956978a8884c69c830fcedd7febd143651fd
blind_reboot_test() did not report the ro.boot.bootreason value,
making it more difficult to diagnose failures.
Test: system/core/bootstat/boot_reason_test.sh cold warm hard
Bug: 63736262
Change-Id: I313cfef202b1e06c583b0b47cd5d0e0888a7dbe7
Deal with regression from 557a9d4054
where bootstat was moved to exec_background and actions delayed
to improve boot time. Add a 1 second sleep rather than trying to
inspect the dmesg (debug only) for clues to the end of bootstat.
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: I87dfb6a07130112bf51c367632967efa53ea2534
Deal with regression from 9a3870490a
where additional content is logged and should be ignored.
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: I70709ba5b00ea18a653ff8d1658556b7d4c48775
Adding a set of automated engineering unit tests with a strict list of
prerequisites. Not meant for "user" builds. Must have a crafted
bootloader that does not set the boot reason. Only works on platforms
where the bootloader either by accident or specifically does not set
the ro.boot.bootreason via kernel command line configuration
androidboot.bootreason=. If the tests do not have the prerequisites,
the test will report success, but with protest.
These new tests should work on current Hikey and Hikey960 bootloaders
but could very well become obsolete if those platform bootloaders
start setting the boot reason.
We do not want a platform solution as it could allow a third party to
override the bootloader boot reason.
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: I1793184a8484b83e1d9077475bc65af9816dadf7
Use actual test durations to refine future duration estimates.
Better estimates are cosmetic, but lend confidence to the test
results.
SideEffects: none
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: I49143b78a6dc6fb21838a3d6c70b7eb5a8b4cba5
Add cold,powerkey, warm,s3_wakeup and hard,hw_reset so that
sys.boot.reason values can also be enumerated. Also add
some reserved speculative entries associated with forced
suspend to RAM and DISK; shutdown,suspend and shutdown,hibernate
respectively.
Test: system/core/bootstat/boot_reason_test.sh
Bug: 67636061
Change-Id: Ic43523748e6006aaca882f8eec7c1f0c08431bd8
Empty boot reason is mostly unexpected but may take up the bulk of
unknown reported boot reason values.
Bug: none
Test: none
Change-Id: I9978658c2b052d5cf5d28299861b0d1125ba9fc0
To make parsing easier for last reboot reason. This also ensures that
last boot reason matches the content that is typically returned by the
bootloader or in turn landed in the canonical system boot reason.
Simplify parsing in bootstat. Adjust and fix boot_reason_test.sh for
new reality. Allow boot reason tests battery and kernel_panic to pass
if device does not support pstore (empty before and after the test).
If device somehow landed in fastboot mode while waiting for the
display, issue a fastboot reboot to move the test along. Some cleanup
and standardization changes to the test script.
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: I97d5467c0b4a6d65df3525f1a2d0051db813d5ad
Change HandleSigtermSignal() handler to report shutdown,container. Add
the new reason to bootstat. Remove log stutter as
HandlPowerctlMessage will also do a LOG(INFO) reporting
shutdown,container as reason.
Sending SIGTERM to init is to allow a host OS to ask an Android
Container instance to shutdown. The temptation is to report
shutdown,sigterm but that does not accurately describe the usage
scenario.
Test: compile
Bug: 63736262
Change-Id: I3c5798921bdbef5d2689ad22a2e8103741b570b4
Establish a tighter trust relationship with digitization of the
battery level. Validate the reboot reason collected from the last
kmsg. Validate the last reboot reason before using for enhancing
system reboot reason.
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63636262
Change-Id: I45fbac4a33c09295cda89de3b73c93a884033e3b
Code style has drifted too far from our current clang-format
settings.
Test: compile
Side-Effects: none
Bug: 63736262
Change-Id: Ie390459db123cfadf09aa06c8dafb1ae69a72af8
If bootloader reports kernel_panic, gives us permission for a
deeper check of which kind occurred. Resolve a problem with
some kernels that do not have the 'sysrq: SysRq :' log stutter.
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: Idb8978de7a4f4e3c671c6ab7efc5ebe841f7bbd8
Run bootstat as a oneshot service rather than as a series of inline
exec commands. exec commands impact boot time.
Test: cts-tradefed run cts-dev --module CtsBootStatsTestCases
system/core/bootstat/boot_reason_test.sh all
(make sure it filters out new init reports)
Bug: 65736247
Change-Id: Ic9d509a8cbee4bc1e278081de1001e25ae0915fd
Send the message multiple times into the last dmesg log so that one
may be picked up without data corruption.
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: Ie42ad1940b1eb4915e4cf6cc61815d0275a70ffe
If the bootloader insists on reporting reboot for cold, warm and
hard, we need to reconstruct canonical reason from the
persist.sys.boot.reason property.
Some log lines contained bootstate, letting their noise through
during testing, changed regex to look for bootstat[^e].
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: I3788c6cf8aca7fc73dd01bf95acb596d18ed7ee4
For reboot [reboot_arg] requests via either reboot or adb reboot,
if reboot_arg is empty then report "shell" or "adb" respectively.
Test: boot_reason_test.sh shell_reboot adb_reboot
Bug: 63736262
Change-Id: Ie613d9e62db6a705885e4e7520aede27af3aa1b9
Adding functionality to bootstat --record_boot_complete and
--record_boot_reason to initialize sys.boot.reason, the canonical
system boot reason.
Filter out ro.boot.bootreason oem noise into sys.boot.reason. Add
heuristics to determine what the boot reason is, when otherwise would
be defaulting to the blunt and relatively devoid of detail catch-all
reboot reasons ("reboot", "shutdown", "cold", "warm", "hard").
boot_reason_test.sh is also a compliance test.
Test: boot_reason_test.sh all
Bug: 63736262
Change-Id: Ic9a42cccbcfc89a5c0e081ba66d577a97c8c8c76
bootstat does not need root uid and root gid permissions to perform
its tasks. It appears that system uid and log gid are adequate and
appropriate.
Test: manual
Bug: 63736262
Change-Id: I094c2cb054e441562fa8717a4d3dc0086fb70a7a