It is expensive to keep the non-protobuf path around and it hasn't
been used for an entire release without anyone noticing, so remove it.
Create new end-to-end unit tests that cover tests of the non-proto
code paths that are being deleted.
Bug: 197981919
Test: Unit tests pass.
Change-Id: Ia1c45572300bd63e5f196ad61e5e5386830c8ece
- Use "likelihood" instead of "probability" since that has connotations
of being less precise, and our probability ordering isn't very precise
anyway.
- Hide the fault address with SEGV_MTEAERR because it is not available.
- Pad the fault address with leading zeroes to make it clearer which
bits of the top byte (and any following bytes such as PAC signature
bits) are set.
Bug: 206015287
Change-Id: I5e1e99b7f3e967c44781d8550bbd7158eb421b64
On the main thread, the siginfo pointer will never be nullptr.
Add a CHECK to make sure this is true.
Test: Unit tests pass both 32 bit and 64 bit.
Test: Ran with debug.debuggerd.translate_proto_to_text set to 0
Test: to exercise old path.
Change-Id: I9d5ed0de5d652de8a4f9cd85eb57cbb1ec676404
This code was added, but a svelte config still tries to use scudo
related code that doesn't exist.
Bug: 201007100
Test: Ran unit tests on normal config.
Test: Ran unit tests on svelte config.
Change-Id: Ic84bae37717d213121aef182bac2f82dbee25213
strerror is nice, but usually I don't care about the text, I care about
the uppercase enum
Bug: N/A
Test: N/A
Change-Id: I8ea86220cb04cbded701379c47b8aba8ea8864b8
I was here because we have a case where timeout(1) kills logcat, but
debuggerd alleges that the process that was killed had started less than
a second ago. I'm not sure this is the problem there, but I did notice
that far too many tombstones were claiming improbably short process
uptimes. It turns out that the code was measuring the *thread* uptime,
not the *process* uptime.
Also simplify the code a bit by switching to sysinfo(2) rather than
reading a file.
Test: manual, plus the existing unit test
Change-Id: Ie2810b1d5777ad9182be92bfb3f60795dc978b24
The libunwindstack code will attempt to dlopen the libdexfile.so
when a dex pc is found. Unfortunately, this failed since that
library was not properly listed as a runtime library. To make
sure this doesn't happen again, add an end to end test that
will create a dex pc frame, and will verify the correct
dex function name is in that frame.
Bug: 199043576
Test: Unit test passes on arm/aarch64/x86/x86_64.
Test: Removed the runtime_libs of libdexfile from libunwindstack
Test: and verified the new test fails.
Change-Id: I3a11f9ee44e06e37a547d193b04f7fbb90ccfe0a
The output.text.fd value is only ever -1 when there is a failure.
There is no need to check both < 0 or -1, so only check for -1.
Test: Unit tests pass.
Test: Verified the message is seen on intercept and not on
Test: regular crashes.
Change-Id: I1eddcd5d2342b268ceb261b246c98b10cee85bb4
Revert "Allow visibility on libdexfile for all libdexfile_suppor..."
Revert "Add libdexfile runtime dependency of libdexfile_support."
Revert "Add libdexfile runtime dependency of libdexfile_support."
Revert submission 1810760-libdexfile-runtime-2
Reason for revert: DroidMonitor: Potential culprit for Bug 198352910 - verifying through Forrest before revert submission. This is part of the standard investigation process, and does not mean your CL will be reverted.
Reverted Changes:
If4da968e4:Add libdexfile runtime dependency of libdexfile_su...
I80162942a:Allow visibility on libdexfile for all libdexfile_...
Iab18abc8e:Add libdexfile runtime dependency of libdexfile_su...
I473d146d8:Add libdexfile runtime dependency of libdexfile_su...
Change-Id: Iacab8e0a5c74e0c3185a155e35c28903aa9acb4a
When the switch was made to dump the tombstone from the protobuf,
the fault address marker in the maps section went missing. Re-add
that logic and add new unit tests to verify all of the different
behaviors.
Bug: 193935960
Test: All unit tests pass.
Test: All unit tests pass when setprop debug.debuggerd.translate_proto_to_text 0
Test: The above on cuttlefish, 32 bit and 64 bit.
Test: The above on a flame, 32 bit and 64 bit.
Change-Id: I098bb6ab4bacacae2ca0fc5ec9a73549ed0b9489
The "missing output fd" message can seem like an error, so modify
the message to indicate what is really happening. This message
will occur normally when running the debuggerd command, or when
a bugreport is generated, or when an ANR occurs. In all of those
cases, this is not an error, but an expected action.
Bug: 196189981
Test: Ran debuggerd -b and debuggerd and verified this message is seen.
Test: Ran unit tests.
Change-Id: I6e3d5a76d92b972c77fca301ea7147745bc67c37
The tombstone will add a newline after the abort message, so remove
any trailing newlines before saving/printing.
Bug: 196414062
Test: Unit tests pass.
Test: Set system property debug.debuggerd.translate_proto_to_text to 0
test: and unit tests still pass.
Change-Id: I0d3dc215eb5d8be93d99e5b9d4f0a14b1d61396d
A lot of things had moved out of system/core/ without their TEST_MAPPING
entries having gone with them, reducing the amount of presubmit coverage
for those things.
In order to reduce the likelihood of that happening again, I've pushed
all that remained in the system/core/ TEST_MAPPING down into the
individual subdirectories.
Test: treehugger
Change-Id: Ib75d65f9200fa64ae1552471da6fbe5b7023cf94
After compiler update, infinite side effect free loops are replaced with trap
instructions. So use -fno-finite-loop to disable this behavior.
Bug: 196162833
Test: run debuggerd_test.
Change-Id: I057263360a5df64af18c17a025fab48887d0b470
When running debuggerd from the command line, it's possible that
the signal will happen on a side thread. The original intercept
in tombstoned is set to only handle crashes from the main thread
pid, so in this case, the intercept doesn't occur. To fix this,
modify the code so that running debuggerd always sends the signal
to the main pid. In addition, modify the signal handler is entered
due to the BIONIC_SIGNAL_DEBUGGER signal, then the crashing tid is
set to the main thread pid instead of the current thread.
Add unit test to cover this case.
Bug: 194346289
Test: All unit tests pass.
Test: Verify the new unit test is getting the signal on the non-main
Test: thread and still properly handling the intercept.
Test: Modify the debuggerd code to send the signal to the non main pid
Test: and verify the dump still occurs correctly.
Change-Id: I2dd1bd11fc8ef4a6fe87f05ecc67ae349a101c82
For the new kernel 5.13 heders, there is a new TRAP_PERF value that
needs to be handled.
Test: Builds.
Change-Id: I2c6658ca94423c210db9ad6692ec69f6be69b3f5
Dumping stack in ANR can fail, but error message is only printed
to logcat. To allow easier debugging of such cases we add the
error messages in the ANR file as well.
Also factor out some duplication, inline single-call functions to
their call sites, and make some of the lambdas clearer by only having
implicit state unrelated to their primary purpose in captures but
passing as arguments things they fundamentally need to do their job
(and actually adding some duplication for time_left() which was subtle
enough to fool me into thinking that we only needed one call of
set_timeout(), which I've renamed to update_timeout()).
Bug: 191172191
Test: Manual
Change-Id: I39a50ca5b72059bfeff48b010d3be44f19eb32fa
writepid command usage to join a cgroup has been deprecated in favor
of a more flexible approach using task_profiles. This way cgroup path
is not hardcoded and cgroup changes can be easily made. Replace
writepid with task_profiles command to migrate between cgroups.
Bug: 191283136
Test: build and boot
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I945c634dfa7621437d8ea3981bce370d680b7371
Using thread cache will cause SIGSEGV for 32bit+kernel4.9 device.
Bug: 190579082
Bug: 189803009
Test: run cts -m CtsSeccompHostTestCases
Change-Id: I47b13d02674aadbacd8dac36d8382eed0885413c
Signed-off-by: yidong zhang <yidong.zhang@amlogic.com>
I'm guessing that the original
F crash_dump64: crash_dump.cpp:460] failed to attach to thread 1671, already traced by 0 ()
was probably a race, where there _was_ a tracer but they disappeared?
Whatever, it doesn't seem helpful to show "already traced by nobody",
and we also don't want to clobber errno in the fallthrough case
(previously just where get_tracer() failed, but now also where
get_tracer() returns "nobody").
Bug: http://b/188668580
Test: treehugger
Change-Id: I3fa3b4f7e32531d48dfbb0ef946ff351ed5d9171
All but three files are Apache-2.0 already.
Bug: http://b/191499510
Test: /google/src/files/head/depot/google3/wireless/android/busytown/ayeaye/analyzers/copyright/tools/scan_android_project.sh ~/aosp/system/core/debuggerd/ | grep -v APACHE
Change-Id: I430c3382dd160e398f02470d7053ecea39c98f41
The code in the fallback path calls pthread_key_create when using the
normal thread cache. However, this code is executed out of the linker,
which means that the call doesn't see keys created by the libc version
of pthread_key_create. As of now, simply avoid using the thread cache
to avoid this problem.
Bug: 189803009
Test: debuggerd -b on a media process on a 32 bit Android Go device
Test: and observe no crash.
Test: debuggerd unit tests pass.
Change-Id: I9ca1a55e44d3bb69d49450826d7d64d7a64145c3
(cherry picked from commit 49e5a76544)
This information clearly meets the bar for being dumped to logcat. If we
omit the info, we may confuse the user into thinking that it's not
available at all, especially if it's their first time seeing an MTE
report.
This also adds some functionality to the integration testing library to
pull logcat messages and scan them to make sure the contents are in both
places.
Bug: 187881237
Test: atest debuggerd_test # on QEMU w/ MTE.
Change-Id: Icc17ea45bda7628331cc4812eaad3bc5c949b7a7
This type of error is unlikely and attempting to detect it with MTE
is likely to produce false positive reports. Make sure that this type
of error is not detected by the allocator.
Change-Id: I90676d1a031411d6b725890311317802bc24b459
This does not currently cause any problems but it does block progress
on the referenced bugs.
Bug: 187910671
Bug: 187914588
Test: m droid
Change-Id: I977cd842101187441ddbc873eac25598295aab06
When moving to the proto-ized tombstones, the note about unreadable
elf files in a backtrace got lost. This re-adds it and adds a test
to verify that the note properly shows up.
Bug: 185428454
Test: Ran unit tests.
Change-Id: I1150cc737772e1b79fd73ec5c782caadc4629421
A change was made so that pthread_create is calling
prctl(PR_PAC_RESET_KEYS, ...) on aarch64. It's possible that other
seccomp policies might need to change to allow this.
Test: CrasherTest.seccomp_backtrace passes on aarch64.
Change-Id: I9c4d1b3dca5f19a6285bf904bb942f1f52e42bd0
Talk of "gdb" when we currently mean "gdb or lldb" and will soon mean
"lldb" is starting to confuse people. Let's use the more neutral
"debugger" in places where it really doesn't matter.
The switch from gdbclient.py to lldbclient.py is a change for another
day...
Test: treehugger
Change-Id: If39ca7e1cdf4c8bb9475f1791cdaf201fbea50e0
Bug: http://b/181927912
Clang already has -Wfree-nonheap-object but it became a default warning
with clang-r416183
Test: compile crasher.cpp
Change-Id: Ice532e9f373a628e07acd08a4fc7bfa7cf5d4e08
Proto tombstones were missing tagged fault addresses, tagged_addr_ctrl,
tags in memory dumps and Scudo and GWP-ASan error reports. Since text
tombstones now go via protos, all of these features broke when we
switched to text tombstones generated from protos by default. Fix
the features by adding support for them to the proto format,
tombstone_proto and tombstone_proto_to_text.
Bug: 135772972
Bug: 182489365
Change-Id: I3ca854546c38755b1f6410a1f6198a44d25ed1c5
Looks like we unintentionally had a breakage after aosp/1595302, where
both GWP-ASan and MTE tests started failing because the extra
information wasn't plumbed through the tombstones. MTE has end-to-end
tests but aren't run continuously, and GWP-ASan was missing the e2e
tests.
Also remove some unique wording for GWP-ASan, a UaF on the free'd
pointer is now "0 bytes into a 16-byte allocation" instead of "on a
16-byte allocation". The former is more descriptive and is more
ubiquitously used in our tooling.
This patch adds the E2E tests, but the underlying problem needs to be
fixed as well, before this patch can land.
Bug: 182489365
Test: atest debuggerd_test
Change-Id: I0fe8aba7ea443b3071724987f46b19a6525cda3c
In order to test the platform in emulators that are orders of magnitude
slower than real hardware we need to be able to avoid hitting timeouts
that prevent it from coming up properly. For this purpose introduce
a system property, ro.hw_timeout_multiplier, which may be set to
an integer value that acts as a multiplier for various timeouts on
the system.
Bug: 178231152
Change-Id: I6d7710beed0c4c5b1720e74e7abe3a586778c678
Merged-In: I6d7710beed0c4c5b1720e74e7abe3a586778c678
Application developers would like to know how long their process has
been alive for to distinguish between crashes that happen immediately
upon startup and crashes in regular operation.
Test: manual
Change-Id: Ia31eeadfcced358b478c7a7c7bb2e8a0252e30f4
We're running into timeouts from death tests because we're ~doubling the
cost of crash dumping by doing it twice.
Bug: http://b/180605583
Test: treehugger
Change-Id: If5b40434171323a09960b70af0124ec08bd3fbe8
On cuttlefish, the number of tombstones allowed is much larger
than 50, so change the algorithm to search for any tombstone
file.
Test: Ran unit tests on cuttlefish with > 50 tombstones.
Test: Ran unit tests on device.
Change-Id: Ia1d885fe19a7f7751fe3386d40b48750d1e21bd5
With this change we can report memory errors involving secondary
allocations. Update the existing crasher tests to also test
UAF/overflow/underflow on allocations with sizes sufficient to trigger
the secondary allocator.
Bug: 135772972
Change-Id: Ic8925c1f18621a8f272e26d5630e5d11d6d34d38
We were already doing this for the text tombstones but not for protos,
which meant that we stopped producing protos once we hit the limit
on the number of tombstones. Move the code for the text tombstones
into a common location and call it for both types.
Change-Id: I4951150da51a32d50821d147458fc5c18200c9d4
Otherwise we can fail to find map entries for tagged addresses,
such as those of heap objects.
Bug: 135772972
Change-Id: Ia626b0587c8461eb575b2de5c08562c73ba4a66e
Now that we default to sync MTE in tests, the default tagged_addr_ctrl
in this test needs to be updated.
Bug: 135772972
Change-Id: I9bf6fb29df9799d1ed8c0d8b66f4d2891f487d80
There's no way to atomically unlink a specific file for which we have an fd from
a path, which means that we can't safely delete a tombstone without coordination
with tombstoned, which is risky. For example, if we use flock on the directory,
and system_server crashes while holding the lock, we risk deadlock.
We do the next best thing, and keep a file descriptor around for every
tombstone, and truncate it, which requires system_server to be able to
write to tombstones (which are owned by the system group).
Test: treehugger
Change-Id: I6ba7f1fe87ee1a4b57bdb3741e8ec9fbc80788c9
Respect ro.timeout_multiplier property. Some of these are required for
tombstone writing to work on MTE QEMU, the rest are done speculatively.
Test: add crashing code to system_server, observe the tombstone
Bug: 178231152
Change-Id: Ic86e494af571301df7af07d13a6c046a0da6bda7
libbase logging uses getprogname() to get the default tag, which breaks
for the fallback handler which is statically linked into the dynamic
linker. Switch to libasync_safe for logging.
Test: atest -c CtsSeccompHostTestCases:android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls
Change-Id: Ieeaf33fb26cff4ba7e1589d1d883ac2fcc74cf47
Revert "Let crash_dump read /proc/$PID."
Revert submission 1556807-tombstone_proto
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Reverted Changes:
Ide6811297:tombstoned: switch from goto to RAII.
I8d285c4b4:tombstoned: make it easier to add more types of ou...
Id0f0fa285:tombstoned: support for protobuf fds.
I6be6082ab:Let crash_dump read /proc/$PID.
Id812ca390:Make protobuf vendor_ramdisk_available.
Ieeece6e6d:libdebuggerd: add protobuf implementation.
Change-Id: I8a77f6b9e1b42902ef7ee250cc3f1fd341ea0e2b
Revert "Let crash_dump read /proc/$PID."
Revert submission 1556807-tombstone_proto
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Reverted Changes:
Ide6811297:tombstoned: switch from goto to RAII.
I8d285c4b4:tombstoned: make it easier to add more types of ou...
Id0f0fa285:tombstoned: support for protobuf fds.
I6be6082ab:Let crash_dump read /proc/$PID.
Id812ca390:Make protobuf vendor_ramdisk_available.
Ieeece6e6d:libdebuggerd: add protobuf implementation.
Change-Id: Ib2403c1b61f6cf0513b76361440fbc5909d7554a
Revert "Let crash_dump read /proc/$PID."
Revert submission 1556807-tombstone_proto
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Reverted Changes:
Ide6811297:tombstoned: switch from goto to RAII.
I8d285c4b4:tombstoned: make it easier to add more types of ou...
Id0f0fa285:tombstoned: support for protobuf fds.
I6be6082ab:Let crash_dump read /proc/$PID.
Id812ca390:Make protobuf vendor_ramdisk_available.
Ieeece6e6d:libdebuggerd: add protobuf implementation.
Change-Id: I0c4f3a17e8b06d6c65255388c571ebf11d371dbb
Revert "Let crash_dump read /proc/$PID."
Revert submission 1556807-tombstone_proto
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Reverted Changes:
Ide6811297:tombstoned: switch from goto to RAII.
I8d285c4b4:tombstoned: make it easier to add more types of ou...
Id0f0fa285:tombstoned: support for protobuf fds.
I6be6082ab:Let crash_dump read /proc/$PID.
Id812ca390:Make protobuf vendor_ramdisk_available.
Ieeece6e6d:libdebuggerd: add protobuf implementation.
Change-Id: Ia0a1ee57e7630e01c495dc166218f665340aad7f
This reverts commit 675cb30f05.
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Change-Id: I82d228f2bc3e6b426d4703732e1c8766815ccc97
* changes:
libdebuggerd: add protobuf implementation.
tombstoned: support for protobuf fds.
tombstoned: make it easier to add more types of outputs.
tombstoned: switch from goto to RAII.
Currently, all MTE failures end up displaying 'Fault address falls at
0x<addr> after any mapped regions'. Clearly when scanning, we should use
the untagged address to figure out which ranges it's in.
I've taken the liberty of removing all si_addr parsing and moving it
into the common ProcessInfo, as well as making it really explicit
whether you want the (possibly tagged) original si_addr, or whether you
want the untagged variant (for scanning /proc/maps or whatever).
This is not particularly easily testable, as ReadCrashInfo isn't easily
injectable and `dump_all_maps` should already be passed the untagged
pointer to scan for. I've tested this locally on FVP under SYNC MTE with
a simple UaF binary and noted the problem is fixed. Given that this is
making the code more clear, I'm hoping the owners see no need for a
regression test :).
Bug: 135772972
Test: On FVP, run 'adb shell MEMTAG_OPTIONS=sync sanitizer-status' and
check that the use-after-free test ends up with the /proc/maps
desription in the right place.
Change-Id: I220e4200c75a72474a95a67e5bbc36173a438dd2
This commit implements protobuf output for tombstones, along with a
translator that should emit bytewise identical output to the existing
tombstone dumping code, except for ancillary data from GWP-ASan and
Scudo, which haven't been implemented yet.
Test: setprop debug.debuggerd.translate.translate_proto_to_text 1 &&
/data/nativetest64/debuggerd_test/debuggerd_test
Test: for TOMBSTONE in /data/tombstones/tombstone_??; do
pbtombstone $TOMBSTONE.pb | diff $TOMBSTONE -
done
Change-Id: Ieeece6e6d1c26eb608b00ec24e2e725e161c8c92
Sadly, it looks like we do still really use libcutils for some of the
socket functions.
Test: treehugger
Change-Id: Ic71f97507c89b10d2f3b7a2971064a9e6b1d349d
Now that the feature guarded by this flag has landed in Linux 5.10
we no longer need the flag, so we can remove it.
Bug: 135772972
Change-Id: I02fa50848cbd0486c23c8a229bb8f1ab5dd5a56f
- Make it apply to every thread, and thus remove the restriction
that it must be called while the program is single threaded.
- Make it change TCF0 itself (on all threads), instead of requiring
callers to do it themselves, which can be error prone.
And update all of the call sites.
Change the implementation of
android_mallopt(M_DISABLE_MEMORY_MITIGATIONS) to call
android_mallopt(M_SET_HEAP_TAGGING_LEVEL) internally. This avoids
crashes during startup that were observed when the two mallopts
updated TCF0 unaware of each other.
I wouldn't expect there to be any out-of-tree callers at this point,
but it's worth noting that the new interface is backwards compatible
with the old one because it strictly expands the set of situations in
which the API can be used (i.e. situations where there are multiple
threads running or where TCF0 hadn't been updated beforehand).
Bug: 135772972
Change-Id: I7746707898ff31ef2e0af01c4f55ba90b72bef51
The discussion on LKML is converging on v16 of the fault address tag
bits patch [1]. In this version of the patch the presence of the tag
bits in si_addr is controlled by a sa_flags bit, and a protocol is
introduced to allow userspace to detect kernel support for sa_flags
bits. Update the tombstone signal handler to use this API to read
the tag bits, update the interceptors in libsigchain to implement
the flag support detection protocol and hide the tag bits in si_addr
from chained signal handlers that did not request them to match the
kernel behavior.
[1] https://lore.kernel.org/linux-arm-kernel/cover.1605235762.git.pcc@google.com/
Change-Id: I57f24c07c01ceb3e5b81cfc15edf559ef7dfc740
It turns out that I had originally written the test with a local
patch applied that forces TCF0 to SYNC, so it was testing for the
wrong tagged_addr_ctrl value. Fix it.
Bug: 135772972
Change-Id: Ibb9b25e5f5635372ad5de7825c31d7264ff02590
This simplifies some of the logic and removes the need to pass an
Arch value to functions that should already know about the arch
it is operating on.
Includes fixes for debuggerd/libbacktrace.
Added new unit tests to cover new cases.
Test: All unit tests pass.
Test: Faked unwinder failing to verify debuggerd error messages display
Test: properly in backtrace and tombstone.
Change-Id: I439fcae0695befcfb1cb4c0a786cc74949d33425
This value indicates whether memory tagging is enabled on a thread,
the mode (sync or async) and the set of excluded tags. This information
can sometimes be important for understanding an MTE related crash,
so include it in the per-thread tombstone output.
Bug: 135772972
Change-Id: I25a16e10ac7fbb2b1ab2a961a5279f787039000b
Until 77fdb22cf6, logd started as
AID_ROOT and then dropped its privileges. Since then, there's been no
reason to use string comparisons rather than checking the uid.
Test: pkill -SEGV logd
Test: treehugger
Change-Id: Ia709f8f59cb0ab9abac7df84c96c701b5d0a83ea
An OEM asks for sub-second granularity, and that's most easily done if
we only have one timestamp generator. I'm not convinced sub-second
granularity is particularly useful myself, and I definitely don't think
that nanosecond resolution is meaningful but I do like this cleanup, and
if I'm going to use sub-second precision I may as well use the maximum
precision available to me.
Also reduce some duplication of code reading cmdline/comm.
Bug: https://issuetracker.google.com/161860597
Test: head /data/tombstones/*
Change-Id: I035ecfd4a3338ccd84dae0ef973a998a7c7c5056
Tags appear in the addresses printed in the memory dump, which seems
like a reasonable place to put them because tagged addresses will
also appear in other places in the tombstone, such as registers and
the fault address.
Bug: 135772972
Change-Id: I52da338347ff6b7503cf5ac80763c540695dc061
Previously, we would do a simple bounds check before deciding
whether to dump the memory around a register. On 64-bit platforms,
the register's value was required to be less than (4 << 60). However,
after stripping tags on AArch64 as part of r.android.com/1365229, all
pointer values became less than (4 << 60), so the check became useless
for filtering out invalid pointers. As a result, we would attempt to
dump memory for all registers, which for a register not containing
a valid pointer would typically consist of 16 lines of dashes.
One possible fix may be to replace the constant (4 << 60) with the
process's actual address space limit (known as TASK_SIZE inside the
kernel; typically 39 bits on AArch64 and 48 bits on x86_64), but the
kernel provides no API for retrieving a process's TASK_SIZE value. We
could guess it by looking at for example the highest bit set in the
value of getauxval(AT_EXECFN), which points to an address on the stack
which typically is mapped at the end of the address space on program
startup, but at least on AArch64 it is possible to dynamically extend
TASK_SIZE at runtime by providing a hint to mmap(), so this is not
always sufficient.
Instead, it seems best to remove most of the early bounds check, and
simply issue ptrace() calls for each register value, bailing out of
the entire output if none of the calls ended up succeeding. This also
has the nice side effect of avoiding 16 lines of noise per register
whose value looks like a pointer but actually points to unmapped
memory. We still retain part of the bounds check in order to avoid
integer overflow during the dump (including overflows into the tag
part of the address on architectures that support tagging).
Bug: 154272452
Change-Id: I94e4b7124b7735b92fd83a49c80ebded3483cd4e
We do not install a 32-bit version of libminijail to 64/32 devices,
which means that "atest -a debuggerd_test" always fails on 32-bit.
Fix it by statically linking libminijail.
Change-Id: I1e5610d1353b4f5b718c1259825421c0c07d7c24
After r.android.com/1288984 we started failing to dump memory contents
for heap addresses because the tag started causing any addresses to
fail this bounds check. Add an untag_address() call to the bounds check
so that the tag is ignored.
Bug: 154272452
Change-Id: I3a6d1a078b21871bd93164150a123549f83289f6
It's impractical to test the contents of the stack trace, but we
should at least test that *a* stack trace is present, which would
have caught the bug fixed by r.android.com/1306754 .
Bug: 135772972
Change-Id: Ic5e0b997caa53c7eeec4e5185df5c043c9d4fe3d
Teach debuggerd to use the new scudo APIs proposed in
https://reviews.llvm.org/D77283 for extracing MTE error reports from crashed
processes, and include those reports in tombstones if possible.
Bug: 135772972
Change-Id: I082dfd0ac9d781cfed2b8c34cc73562614bb0dbb
log/log.h primarily concerns itself with writing logs. The few users
who read logs should directly include log/log_read.h.
Bug: 78370064
Test: build
Change-Id: Ie95c55ea2ffc76fc95768323d445ada6ad4f2520
If crash_dump dies before it gets a chance to write to the pipe we use
to let the debugged-process know that it successfully started, we
weren't cleaning up the child we fork to start it, leaving a zombie
child.
Bug: http://b/152119184
Test: debuggerd_test
Change-Id: Id01cc05f693995e9998941774f74ab8e3d8b4d8a
On aarch64, the top 8 bits of the address (i.e. the tag bits) of
the fault address in si_addr are always clear. This isn't ideal for
MTE which will require these bits in order to correctly diagnose
tag mismatches.
A proposed kernel patch [1] exposes the full fault address including
the tag bits as part of the ucontext. Change debuggerd to read this
fault address if available.
[1] https://patchwork.kernel.org/patch/11435077/
Bug: 135772972
Change-Id: Ia05be574113860f4e9ecc36a310c4b740e0c4afb
GWP-ASan uses frame-pointer based unwinding internally on
allocation/deallocation to collect stack traces that are used when
crashes are reported.
This should be generic, so pull it out into libunwindstack so it can be
used by MTE as well.
Bug: 152412331
Test: atest debuggerd_test
Change-Id: I27b32263aac63446f5fe398af108676b70cd3971
Similar to r.android.com/1247247 I'll be adding more of them for MTE.
Also, change the protocol between the crasher and crash_dump to make
it easier to add new fields and change the referenced data structures
without needing to worry about versioning. The version number for
static executables is now always 1 (where the protocol will never
change), while the version number for dynamic executables is always
4 (where the protocol can change, because the linker and crash_dump
are version locked).
Bug: 135772972
Change-Id: Ib4696d0544d7c87cb429aaaa15f18c3640059e16
We're now using it in contexts that don't have all of the registers available,
such as GWP-ASan and soon MTE, so it doesn't make sense to have it be a
member function of Regs.
Bug: 135772972
Change-Id: I18b104ea0adb78588d7e475d0624cefc701ba52c
- Create a static library libunwindstack_no_dex without DEX support.
- Use it in libdebuggerd_handler_fallback, whose only use is in the
linker, which shouldn't need that support.
- Use it in init_first_stage, which doesn't need DEX support either.
- Also need a libbacktrace_no_dex since it's in the dependency chain
from init_first_stage to libunwindstack_no_dex.
Also restrict the *_no_dex libs and libdebuggerd_handler_fallback as
much as possible to avoid inadvertent use of these reduced
functionality libs.
Test: m init_first_stage on Cuttlefish
where BOARD_BUILD_SYSTEM_ROOT_IMAGE=false
Test: m system_image com.android.runtime
Test: Build & boot
Test: atest linker-unit-tests libunwindstack_unit_test debuggerd_test
Bug: 142944931
Bug: 151466650
Change-Id: Iaacb29bfe602f3ca12a00a712e2a64c45ff0118b
A future change will introduce a version lock between linker and
crash_dump. Move crash_dump into the runtime APEX alongside linker in order to
ensure that they will be the same version even if the runtime APEX is updated.
Bug: 135772972
Change-Id: Ic2eae31b6927eb0e8a62315ac141f50933c00bcc
Merged-In: Ic2eae31b6927eb0e8a62315ac141f50933c00bcc
We're now passing around a couple of addresses for GWP-ASan in addition
to abort_msg_address and fdsan_table_address, and I'm going to need to add
more of them for MTE. Move them into a data structure in order to simplify
various function signatures.
Bug: 135772972
Change-Id: Ie01e1bd93a9ab64f21865f56574696825a6a125f
On userdebug/eng devices, check a system property to see whether we
should create tombstones or not. OEMs that would rather have core dumps
can set this property and configure /proc/sys/kernel/core_pattern
appropriately.
Bug: https://issuetracker.google.com/149663286
Test: set the property, cause a crash
Change-Id: If894b4582a1820b64bdae819cec593b7710cb6e3
GWP-ASan can provide information about a crash that it caused. Grab the
GWP-ASan regions from the globals shared by the linker for crash-handler
purpopses, pull the information from GWP-ASan, and display it.
This adds two regions:
1. Causality tracking by GWP-ASan. We now print a cause header about
the crash, like `Cause: [GWP-ASan]: Use After Free on a 1-byte
allocation at 0x7365bb3ff8`
2. Allocation and deallocation stack traces.
Bug: 135634846
Test: atest debuggerd_test
Change-Id: Id28d5400c9a9a053fcde83a4788f971e677d4643
use Android.bp instead of Android.mk to build and install the
crash_dump.policy files. This also allows mainline modules to pull
the files into their apex (dependency wasn't handled for Android.mk)
Bug: 147914640
Test: build, examine generated filesystem
Change-Id: Iae92d4f9d683ccfddf1716e7eb2877b7bff0c737
This takes a lot of space, isn't convincingly useful, and makes it
likely that the far more valuable stuff that comes after it gets
truncated. So let's just drop it.
Bug: http://b/139860930
Test: manual crasher, presubmit
Change-Id: Ie417ffc07e3cb17e95fdb3d183f8c87de0f34b89
1 page isn't enough to log on AArch64, and clean pages are free, so
increase the stack size to 8 pages.
Bug: http://b/144887737
Test: treehugger
Change-Id: I731b3bc27ab37f4b830a9478a04cd34d4f7648d3
GWP-ASan's crash information retrieval services requires a Printf()
function (declared by the system/implementing allocator). In this
instance, because _LOG is called with additional arguments (the log_t),
this function must be wrapped to conform to printf_t defined by
GWP-ASan.
We can easily wrap the variadic version.
Bug: 135634846
Test: atest debuggerd_test
Change-Id: I17209cd2b7455ce889e2f8194969f606cac329eb
A thread's PSTATE can sometimes be critical for understanding a crash,
especially with MTE and other new features that store per-thread state
in PSTATE.
Bug: 135772972
Change-Id: I1bee25bffe7eea395f04b6449dc9227298cf866e
logger_entry and logger_entry_v2 were used for the kernel logger,
which we have long since deprecated. logger_entry_v3 is the same as
logger_entry_v4 without a uid field, so it is trivially removable,
especially since we're now always providing uids in log messages.
liblog and logd already get updated in sync with each other, so we
have no reason for backwards compatibility with their format.
Test: build, unit tests
Change-Id: I27c90609f28c8d826e5614fdb3fe59bde22b5042
debuggerd_client.race seems to have suddenly started to flake, for no
apparent reason. This doesn't seem to reproduce locally, so increase
the timeouts to rule out our test VMs being slow.
Bug: http://b/142571257
Test: treehugger
Change-Id: Ic54a78b8da36cb1163cec7e7976c73c3da628a30
C++20 wants members to be ordered unlike C99.
Bug: 139945549
Test: mm
Change-Id: I3cbca589511c1e0bbc10c691949e18de77e16031
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
We're missing useful crashes, especially on hwasan builds.
Bug: http://b/140580637
Test: run crasher
Change-Id: Ib5d8d3bd3fc4d7fec77d0b10302e5595f97a3515
There is still some flakiness, so increase the timeout values.
Also remove the TEMP_FAILURE_RETRY macro usage in TIMEOUT calls.
That macro disables the ability of the alarm code to interrupt
the system call.
Bug: 141045754
Test: Unit tests pass.
Change-Id: Ia3c95dccc3076a3fd5ef6432097a57e4ccee4df3
The fdsan code uses getrlimit/ugetrlimit so need to allow that when
running the debuggerd unit tests.
Bug: 141045754
Test: Ran the offending tests hundreds of times without failure.
Change-Id: Iece94f03e7895d61ca8a8f3ab17dce7e54ddf9cd
Catch as many early-boot crashes as we can by starting tombstoned
immediately after /data is mounted.
Bug: http://b/139864948
Test: adb shell su 0 dmesg | grep "starting service"
Change-Id: I7f8821102191a445e87020f3efa59a2e0620d9db
Since only privileged processes with CAP_SYS_ADMIN can read kernel
stack traces from /proc/*/stack, we dump the waiting channels
instead to provide some insight as to where the process might
be stuck in the kernel.
Bug: 135458700
Fixes: 135458700
Test: adb shell am hang; Check /data/anr/<anr-file> for
wchan data.
Change-Id: I9f13511ad89a259ce5e5465155db15d45d2c46d8
Test: Ran new unit tests.
Test: Ran crasher stack-overflow, crasher64 stack-overflow and verified
Test: stack overflow cause is shown.
Test: Ran stack overflow app and verified tombstone includes stack-overflow
Test: message.
Change-Id: I9bb01186dff5ed81c77d84b6aaedb5332ddd7256
This is for Android Telemetry to be able to categorise the processes
that produce tombstones.
Test: atest debugerd_test:TombstoneTest
Change-Id: Ie635347c9839eb58bfd27739050bd68cbdbf98da
Modify the unwinder library to indicate that at least one of the stack
frames contains an elf file that is unreadable.
Modify debuggerd to display a note about the unreadable frame and a possible
way to fix it.
Bug: 129769339
Test: New unit tests pass.
Test: Ran an app that crashes and has an unreadable file and verified the
Test: message is displayed. Then setenforce 0 and verify the message is
Test: not displayed.
Change-Id: Ibc4fe1d117e9b5840290454e90914ddc698d3cc2
There appears to be a kernel bug that causes SIGHUP and SIGCONT to be
sent to the parent process group we spawn from if the process group
contains stopped jobs (e.g. the parent itself, because of wait_for_gdb).
Call setsid in all of our children to prevent this from happening.
Bug: http://b/31124563
Test: adb shell 'setprop debug.debuggerd.wait_for_gdb 1; killall -ABRT surfaceflinger'
Change-Id: I1a48d70886880a5bfbe2deb80d48deece55faf09
Somehow the code was still including this include from libbacktrace.
I think the libbacktrace include directory was coming from some
transitive includes. I verified that nothing in debuggerd is using
the libbacktace.so shared library.
Bug: 120606663
Test: Builds, unit tests pass.
Change-Id: I85c2837c5a539ccefc5a7140949988058d21697a
Update the entries only when the list is modified by the runtime.
Check that the list wasn't concurrently modified when being read.
Bug: 124287208
Test: libunwindstack_test
Test: art/test.py -b --host -r -t 137-cfi
Change-Id: I87ba70322053a01b3d5be1fdf6310e1dc21bb084
Update debuggerd to print BuildId information by default.
Bug: 120975492
Test: New unit tests pass.
Test: debuggerd -b <PID> shows build id information.
Test: tombstones include build id information.
Change-Id: I019b031113d0b77385516223c63455b868924440
If a process is ptraced already, we might not be able to exec crash_dump
due to selinux. Since we can be called for non-fatal events, we
shouldn't abort in that case.
Bug: http://b/128054996
Test: treehugger
Change-Id: I1442041caa7af908df2ab87b9e010c44082e7587
Currently, moving or copying a Maps object leads to double free of MapInfo.
Even moving a Maps object did not prevent this, as after a move
the object only has to be in an "unspecified but valid state", which can
be the original state for a vector of raw pointers (but not for a vector
of unique_ptrs).
Changing to unique_ptrs is the most failsafe way to make sure we never
accidentally destruct MapInfo.
Test: atest libuwindstack_test
Failed LocalUnwinderTest#unwind_after_dlopen which also fails at master.
Change-Id: Id1c9739b334da5c1ba532fd55366e115940a66d3
It should be dlopen'ed lazily by libdexfile_support now.
Also change debuggerd_test to not link libunwindstack and its dependencies
statically - the static libs can overlap with the dynamic ones.
Test: mmma system/core/debuggerd/
Test: atest debuggerd_test
Test: mmma system/core/{libunwindstack,libbacktrace}, run host gtests (cannot get atest to work)
Bug: 124827589
Bug: 123186083
Change-Id: I9e7bf9bcbae499af4e1be4c9854bce441e2a7b55
This is necessary since the dynamic one is now using dlopen(), which isn't
available in static builds.
Test: m
Test: mmma system/core/{libunwindstack,libbacktrace}, run host gtests (cannot get atest to work)
Bug: 123403798
Bug: 123186083
Change-Id: I06a9cdfe7e7cc01427ffd54b66c8ebab88782260
Small modifications to the dump_stack method and added unit tests to
verify the output.
Bug: 120606663
Test: Unit tests pass, debuggerd run on processes on target.
Change-Id: Id385a915b751abda3dd6baebed6c3ce498c3bf6e
With our method returning 'bool', a "return -1" is interpretted
as 'true'. We change this to an explicit 'false', as desired.
Test: TreeHugger
Change-Id: I222858b797bc4242a2dc6d4fe81df3d2586d055a
This reverts commit 444e23d2fc.
The rest of the topic doesn't need to be reverted.
Reason for revert: Breaks renderscript on marlin and sailfish.
Test: Manual repro of http://b/121110092#comment1 on reported branch
Test: "atest CtsRenderscriptTestCases" on that branch
Test: mmma system/core/{libunwindstack,libbacktrace}, run host gtests
Test: Make image, flash, and reboot device.
Bug: 121110092, 119632407
Change-Id: If1976b19ce386c95bc5bd4fd6d523745c167de18
There is a problem about tombstone, which it will fail to
generate tombstone file in some scenarios due to socket
communication exception.
Reproduce step:
step 1: reboot device
step 2: ps -ef |grep zygote , get the pid of zygote64
(Attention: zygote64 should never been killed or reboot,
otherwise we can get the tombstone file)
step 3: kill -5 pid of zygote64
step 4: cd data/tombstones/, and could not find the tombstone
file of zygote64.
[Cause Analysis]
1. There are following logs by logcat:
11-19 15:38:43.789 569 569 F libc : Fatal signal 5 (SIGTRAP),
code 0 (SI_USER) in tid 569 (main), pid 569 (main)
11-19 15:38:43.829 6115 6115 I crash_dump64: obtaining output
fd from tombstoned, type: kDebuggerdTombstone
11-19 15:38:43.830 569 5836 I Zygote : Process 6114 exited
cleanly (0)
11-19 15:38:43.830 777 777 I /system/bin/tombstoned: received
crash request for pid 569
11-19 15:38:43.831 6115 6115 I crash_dump64: performing dump of
process 569 (target tid = 569)
...
11-19 15:38:43.937 777 777 W /system/bin/tombstoned: crash
socket received short read of length 0 (expected 12)
2. The last log was print by function of crash_request_cb in
file of tombstoned.cpp, following related code:
rc = TEMP_FAILURE_RETRY(read(sockfd, &request, sizeof(request)));
if (rc == -1) {
PLOG(WARNING) << "failed to read from crash socket";
goto fail;
} else if (rc != sizeof(request)) {
LOG(WARNING) << "crash socket received short read of length " << rc << " (expected "
<< sizeof(request) << ")";
goto fail;
}
Tombstoned read message by socket, and now the message length is
zero. Some socket communication exception occurs at that time.
We try to let crash_dump resend the socket message when the
communication is abnormal. Just as this CL.
Test: 1 reboot device
2 ps -ef |grep zygote , get the pid of zygote64
(Attention: zygote64 should never been killed or reboot,
otherwise we can get the tombstone file)
3 kill -5 pid of zygote64
4 cd data/tombstones/, and could find the tombstone file of
zygote64.
Change-Id: Ic152b081024d6c12f757927079fd221b63445b18
Make XOM related crashes a little less mysterious by adding an abort
cause explaining the crash.
Bug: 77958880
Test: Abort cause in tombstone for a XOM-related crash.
Change-Id: I7af1bc251d9823bc755ad98d8b3b87c12bbaecba
Previously, when we received simultaneous dump requests, we were CASing
a file descriptor value into a variable, and then failing to close it
if the CAS failed.
Bug: http://b/118412443
Test: debuggerd_test
Change-Id: I075c35a239426002eb9416da3d268c3d1a18e9d2
TEMP_FAILURE_RETRY's result was unused for the call to read(), so now
mark it as such to silence a possible unused result warning. For
__read_chk(), this function is an internal implementation detail of
FORTIFY in Bionic. Under clang-tidy, FORTIFY checks are actually
removed, so this now results in an unknown function being called. The
code should not be explicitly depending on an implementation detail, but
we can just suppress the failing case to retain test coverage of the
actual implementation.
Bug: http://b/110779387
Test: Build using WITH_TIDY=1
Change-Id: If83ac1d6f3b6dc32c0d0fb56d8e675e53b586f78
Previously, if an intercept ends before we ask for a file descriptor
when doing a backtrace, we'll create a tombstone file instead.
Bug: http://b/114139908
Bug: http://b/115349586
Test: debuggerd_test32
Change-Id: I23c7bb8ae5a982a4374a862d0a4f17bee03eb1d9
system_server is sometimes failing to dump with the following error:
libdebuggerd_client: received packet of unexpected length from tombstoned: expected 128, received -1
Improve the logging to try to figure out what's going on.
Bug: http://b/114139908
Test: treehugger
Change-Id: Iee1bdc0891b9fc7bd80a330495ec22a530febddb
Make it possible for code such as fdsan that generates debugging
tombstones via raise(DEBUGGER_SIGNAL) to pass an abort message as well.
Bug: http://b/112770187
Test: debuggerd_test
Change-Id: Idc34263241c18033573e466da3a45aa6f716ddb3
This commit only prints the raw value of the owner tag, pretty-printing
will come in a follow-up commit.
Test: debuggerd `pidof adbd`
Test: static_crasher fdsan_file + manual inspection of tombstone
Change-Id: Idb7375a12e410d5b51e6fcb6885d4beb20bccd0e
Pass the address of the fdsan table down to crash_dump so that we can
dump the fdsan table along with the open file descriptor list.
Test: debuggerd_test
Test: manually ran an old static_crasher
Change-Id: Icbac5487109f2db1e1061c4d46de11b016b299e3
adbd has been built as a static executable since the same binary was
copied to the recovery partition where shared library is not supported.
However, since we now support shared library in the recovery partition,
adbd is built as a dynamic executable.
In addition, the dependency from adbd to libdebuggerd_handler is removed
as debuggerd is handled by the dynamic linker.
A few more modules in /system/core are marked as recovery_available:
true as they are transitive dependencies of the dynamic linker.
This change also includes ld.config.recovery.txt which is the linker
config file for the recovery mode. It is installed to /etc/ld.config.txt
and contains linker namespace config for the dynamic binaries under
/sbin.
Bug: 63673171
Test: `adb reboot recovery; adb devices` shows the device ID
Test: Select 'mount /system' in the recovery mode, then `adb shell`.
$ lsof -p `pidof adbd` shows that libm.so, libc.so, etc. are loaded from
the /lib directory.
Change-Id: I363d5a787863f1677ee40afb5d5841321ddaae77
Include the illegal instruction in the header if we get a
SIGILL. Otherwise (since these tend to be one-off bit flips), we don't
usually have any information to try to confirm our suspicion that any
given instance is actually a one-off bit flip.
Also add `SIGILL` as a crasher option to easily generate such crashes.
Before:
signal 4 (SIGILL), code 1 (ILL_ILLOPC), fault addr 0xab1456da
After:
signal 4 (SIGILL), code 1 (ILL_ILLOPC), fault addr 0xab1456da (*pc=0xe7f0def0)
Bug: http://b/77274448
Test: ran crasher
Change-Id: I5f8dedca5eea2b117b1b1e48430214b38e1366ed
adbd (and its dependencies) are marked as recovery_available:true so
that recovery version of the binary is built separately from the one for
system partition. This allows us to stop copying the system version to
the recovery partition and also opens up the way to enable shared
libraries in the recovery partition. Then we can also build adbd as a
dynamic executable.
Bug: 79146551
Test: m -j adbd.recovery
Change-Id: Ib95614c7435f9d0afc02a0c7d5ae1a94e439e32a
Switch from _exit to raising SIGABRT when we recurse in the fallback
handler, so that waiters see an abort instead of a regular exit.
Bug: http://b/79717060
Test: debuggerd_test32
Test: debuggerd_test64
Change-Id: Iddee1cb1b759690adf07bbb8cd0fda2faac87571
* New lld could create files that map to non-zero
offset at run time.
Test: debuggerd_test
Bug: 79590156
Change-Id: I12db0ebef489ba8a1e648a29d214f8d3c3703996
We can't actually link an unlinked file back onto disk if it wasn't
opened with O_TMPFILE. Switch to using a temporary filename instead.
Bug: http://b/77729983
Test: agampe
Change-Id: I1970497114f0056065a1ba65f6358f08b51ec551
70d8f28945 broke a test that was not
expecting to see the new detail about the signal's sender.
Bug: http://b/78594105
Test: ran tests
Change-Id: Idfa3a53b9e664308efdba560ffbb1401c1904530
We have a LOG(FATAL) that can potentially happen before we turn off
SIGABRT. Move the signal handler defusing to the very start of main.
Bug: http://b/77920633
Test: treehugger
Change-Id: I7a2f2a0f2bed16e54467388044eca254102aa6a0
Suicide doesn't change:
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
But homicide now looks like this (this is `sleep 666` killed by
`kill -SEGV` as root:
signal 11 (SIGSEGV), code 0 (SI_USER from pid 4446, uid 0), fault addr --------
Bug: http://b/78594105
Test: manual
Change-Id: I8c2feafba8cc5a3db85e8250004d428a464c5d9e