It's impractical to test the contents of the stack trace, but we
should at least test that *a* stack trace is present, which would
have caught the bug fixed by r.android.com/1306754 .
Bug: 135772972
Change-Id: Ic5e0b997caa53c7eeec4e5185df5c043c9d4fe3d
Teach debuggerd to use the new scudo APIs proposed in
https://reviews.llvm.org/D77283 for extracing MTE error reports from crashed
processes, and include those reports in tombstones if possible.
Bug: 135772972
Change-Id: I082dfd0ac9d781cfed2b8c34cc73562614bb0dbb
log/log.h primarily concerns itself with writing logs. The few users
who read logs should directly include log/log_read.h.
Bug: 78370064
Test: build
Change-Id: Ie95c55ea2ffc76fc95768323d445ada6ad4f2520
If crash_dump dies before it gets a chance to write to the pipe we use
to let the debugged-process know that it successfully started, we
weren't cleaning up the child we fork to start it, leaving a zombie
child.
Bug: http://b/152119184
Test: debuggerd_test
Change-Id: Id01cc05f693995e9998941774f74ab8e3d8b4d8a
On aarch64, the top 8 bits of the address (i.e. the tag bits) of
the fault address in si_addr are always clear. This isn't ideal for
MTE which will require these bits in order to correctly diagnose
tag mismatches.
A proposed kernel patch [1] exposes the full fault address including
the tag bits as part of the ucontext. Change debuggerd to read this
fault address if available.
[1] https://patchwork.kernel.org/patch/11435077/
Bug: 135772972
Change-Id: Ia05be574113860f4e9ecc36a310c4b740e0c4afb
GWP-ASan uses frame-pointer based unwinding internally on
allocation/deallocation to collect stack traces that are used when
crashes are reported.
This should be generic, so pull it out into libunwindstack so it can be
used by MTE as well.
Bug: 152412331
Test: atest debuggerd_test
Change-Id: I27b32263aac63446f5fe398af108676b70cd3971
Similar to r.android.com/1247247 I'll be adding more of them for MTE.
Also, change the protocol between the crasher and crash_dump to make
it easier to add new fields and change the referenced data structures
without needing to worry about versioning. The version number for
static executables is now always 1 (where the protocol will never
change), while the version number for dynamic executables is always
4 (where the protocol can change, because the linker and crash_dump
are version locked).
Bug: 135772972
Change-Id: Ib4696d0544d7c87cb429aaaa15f18c3640059e16
We're now using it in contexts that don't have all of the registers available,
such as GWP-ASan and soon MTE, so it doesn't make sense to have it be a
member function of Regs.
Bug: 135772972
Change-Id: I18b104ea0adb78588d7e475d0624cefc701ba52c
- Create a static library libunwindstack_no_dex without DEX support.
- Use it in libdebuggerd_handler_fallback, whose only use is in the
linker, which shouldn't need that support.
- Use it in init_first_stage, which doesn't need DEX support either.
- Also need a libbacktrace_no_dex since it's in the dependency chain
from init_first_stage to libunwindstack_no_dex.
Also restrict the *_no_dex libs and libdebuggerd_handler_fallback as
much as possible to avoid inadvertent use of these reduced
functionality libs.
Test: m init_first_stage on Cuttlefish
where BOARD_BUILD_SYSTEM_ROOT_IMAGE=false
Test: m system_image com.android.runtime
Test: Build & boot
Test: atest linker-unit-tests libunwindstack_unit_test debuggerd_test
Bug: 142944931
Bug: 151466650
Change-Id: Iaacb29bfe602f3ca12a00a712e2a64c45ff0118b
A future change will introduce a version lock between linker and
crash_dump. Move crash_dump into the runtime APEX alongside linker in order to
ensure that they will be the same version even if the runtime APEX is updated.
Bug: 135772972
Change-Id: Ic2eae31b6927eb0e8a62315ac141f50933c00bcc
Merged-In: Ic2eae31b6927eb0e8a62315ac141f50933c00bcc
We're now passing around a couple of addresses for GWP-ASan in addition
to abort_msg_address and fdsan_table_address, and I'm going to need to add
more of them for MTE. Move them into a data structure in order to simplify
various function signatures.
Bug: 135772972
Change-Id: Ie01e1bd93a9ab64f21865f56574696825a6a125f
On userdebug/eng devices, check a system property to see whether we
should create tombstones or not. OEMs that would rather have core dumps
can set this property and configure /proc/sys/kernel/core_pattern
appropriately.
Bug: https://issuetracker.google.com/149663286
Test: set the property, cause a crash
Change-Id: If894b4582a1820b64bdae819cec593b7710cb6e3
GWP-ASan can provide information about a crash that it caused. Grab the
GWP-ASan regions from the globals shared by the linker for crash-handler
purpopses, pull the information from GWP-ASan, and display it.
This adds two regions:
1. Causality tracking by GWP-ASan. We now print a cause header about
the crash, like `Cause: [GWP-ASan]: Use After Free on a 1-byte
allocation at 0x7365bb3ff8`
2. Allocation and deallocation stack traces.
Bug: 135634846
Test: atest debuggerd_test
Change-Id: Id28d5400c9a9a053fcde83a4788f971e677d4643
use Android.bp instead of Android.mk to build and install the
crash_dump.policy files. This also allows mainline modules to pull
the files into their apex (dependency wasn't handled for Android.mk)
Bug: 147914640
Test: build, examine generated filesystem
Change-Id: Iae92d4f9d683ccfddf1716e7eb2877b7bff0c737
This takes a lot of space, isn't convincingly useful, and makes it
likely that the far more valuable stuff that comes after it gets
truncated. So let's just drop it.
Bug: http://b/139860930
Test: manual crasher, presubmit
Change-Id: Ie417ffc07e3cb17e95fdb3d183f8c87de0f34b89
1 page isn't enough to log on AArch64, and clean pages are free, so
increase the stack size to 8 pages.
Bug: http://b/144887737
Test: treehugger
Change-Id: I731b3bc27ab37f4b830a9478a04cd34d4f7648d3
GWP-ASan's crash information retrieval services requires a Printf()
function (declared by the system/implementing allocator). In this
instance, because _LOG is called with additional arguments (the log_t),
this function must be wrapped to conform to printf_t defined by
GWP-ASan.
We can easily wrap the variadic version.
Bug: 135634846
Test: atest debuggerd_test
Change-Id: I17209cd2b7455ce889e2f8194969f606cac329eb
A thread's PSTATE can sometimes be critical for understanding a crash,
especially with MTE and other new features that store per-thread state
in PSTATE.
Bug: 135772972
Change-Id: I1bee25bffe7eea395f04b6449dc9227298cf866e
logger_entry and logger_entry_v2 were used for the kernel logger,
which we have long since deprecated. logger_entry_v3 is the same as
logger_entry_v4 without a uid field, so it is trivially removable,
especially since we're now always providing uids in log messages.
liblog and logd already get updated in sync with each other, so we
have no reason for backwards compatibility with their format.
Test: build, unit tests
Change-Id: I27c90609f28c8d826e5614fdb3fe59bde22b5042
debuggerd_client.race seems to have suddenly started to flake, for no
apparent reason. This doesn't seem to reproduce locally, so increase
the timeouts to rule out our test VMs being slow.
Bug: http://b/142571257
Test: treehugger
Change-Id: Ic54a78b8da36cb1163cec7e7976c73c3da628a30
C++20 wants members to be ordered unlike C99.
Bug: 139945549
Test: mm
Change-Id: I3cbca589511c1e0bbc10c691949e18de77e16031
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
We're missing useful crashes, especially on hwasan builds.
Bug: http://b/140580637
Test: run crasher
Change-Id: Ib5d8d3bd3fc4d7fec77d0b10302e5595f97a3515
There is still some flakiness, so increase the timeout values.
Also remove the TEMP_FAILURE_RETRY macro usage in TIMEOUT calls.
That macro disables the ability of the alarm code to interrupt
the system call.
Bug: 141045754
Test: Unit tests pass.
Change-Id: Ia3c95dccc3076a3fd5ef6432097a57e4ccee4df3
The fdsan code uses getrlimit/ugetrlimit so need to allow that when
running the debuggerd unit tests.
Bug: 141045754
Test: Ran the offending tests hundreds of times without failure.
Change-Id: Iece94f03e7895d61ca8a8f3ab17dce7e54ddf9cd
Catch as many early-boot crashes as we can by starting tombstoned
immediately after /data is mounted.
Bug: http://b/139864948
Test: adb shell su 0 dmesg | grep "starting service"
Change-Id: I7f8821102191a445e87020f3efa59a2e0620d9db
Since only privileged processes with CAP_SYS_ADMIN can read kernel
stack traces from /proc/*/stack, we dump the waiting channels
instead to provide some insight as to where the process might
be stuck in the kernel.
Bug: 135458700
Fixes: 135458700
Test: adb shell am hang; Check /data/anr/<anr-file> for
wchan data.
Change-Id: I9f13511ad89a259ce5e5465155db15d45d2c46d8
Test: Ran new unit tests.
Test: Ran crasher stack-overflow, crasher64 stack-overflow and verified
Test: stack overflow cause is shown.
Test: Ran stack overflow app and verified tombstone includes stack-overflow
Test: message.
Change-Id: I9bb01186dff5ed81c77d84b6aaedb5332ddd7256
This is for Android Telemetry to be able to categorise the processes
that produce tombstones.
Test: atest debugerd_test:TombstoneTest
Change-Id: Ie635347c9839eb58bfd27739050bd68cbdbf98da
Modify the unwinder library to indicate that at least one of the stack
frames contains an elf file that is unreadable.
Modify debuggerd to display a note about the unreadable frame and a possible
way to fix it.
Bug: 129769339
Test: New unit tests pass.
Test: Ran an app that crashes and has an unreadable file and verified the
Test: message is displayed. Then setenforce 0 and verify the message is
Test: not displayed.
Change-Id: Ibc4fe1d117e9b5840290454e90914ddc698d3cc2
There appears to be a kernel bug that causes SIGHUP and SIGCONT to be
sent to the parent process group we spawn from if the process group
contains stopped jobs (e.g. the parent itself, because of wait_for_gdb).
Call setsid in all of our children to prevent this from happening.
Bug: http://b/31124563
Test: adb shell 'setprop debug.debuggerd.wait_for_gdb 1; killall -ABRT surfaceflinger'
Change-Id: I1a48d70886880a5bfbe2deb80d48deece55faf09