This is a no-op but will be used in upcoming scudo changes that allow to
change the depot size at process startup time, and as such we will no
longer be able to call __scudo_get_stack_depot_size in debuggerd.
We already did the equivalent change for the ring buffer size in
https://r.android.com/q/topic:%22scudo_ring_buffer_size%22
Bug: 309446692
Change-Id: I761a7602c54a1f8f2d0575c5e011820d8dbaab63
The only way to get a bad architecture value in the protobuf is if
the data was corrupted or an unsupported architecture was added without
the register support.
If the protobuf is corrupted, this is strictly better since it
still produces a tombstone with the data present.
If there is an unsupported architecture, it will still result in a tombstone,
only the registers would not be present. It would also be very obviously
a problem that needs to be fixed. Again, this is strictly better since
the crash in generation is not necessarily visible unless you look at
the log. Here, the data is in the log and in the tombstone.
This also removes the only dependency in this file on the async_safe
library.
Test: Ran unit tests.
Test: Forced an invalid architecture and verified tombstone is present
Test: with error message, and error message printed in the log.
Change-Id: I8e4a2e3f778fafb5b7241c2f23d5f867f1341ed8
Timeouts in tombstoned.cpp and intercept_manager.cpp are scaled
by HwTimeoutMultiplier, but the timeouts in debuggerd_test.cpp
are not, which means the CrasherTest#intercept_timeout test will
fail for any platform that has a high enough HwTimeoutMultiplier.
Bug: 309532789
Test: debuggerd_test.CrasherTest#intercept_timeout
Change-Id: I83cd01e87644c011efa155a32fd5d92cc8a43a95
The new 6.6 kernel headers added a new segv type, SEGV_CPERR. Add this
to the switch statement.
Test: Unit tests pass.
Change-Id: I77eb4748e51c7e7d7291bfd2180b0ccb3b5a6ded
While doing this, refactor the intercept code to be easier to understand.
The primary use case for this is to perform a parallel stack dump (both Java and native) for specific ANRs.
Add tests for all of the different intercept conditions.
Modify the tests to display the error message from the intercept
response if there is an error.
Bug: 254634348
Test: All unit tests pass.
Test: Ran debuggerd on native and java processes.
Test: Created a bugreport without error.
Change-Id: Ic531ccee05b9a470748b815cf109e0076150a0b6
A clang update enabled -Wreorder-init-list by default. Since it doesn't
provide any benefit to the debuggerd code, disable the warning.
Test: Builds without warnings.
Change-Id: I75cfe064ba92c74312ba33f329b1364258eba06c
aosp/2734054 added socket timeouts for nonblocking liblog ops.
seccomp policy was not updated so tests failed when unallowed
socksetopt syscall was made.
Bug: 298420226
Test: atest debuggerd_test
Change-Id: Iace232ec8b94e5d316d344abc5d866fe314607e0
Signed-off-by: Andrei Diea <adiea@google.com>
Check for the log opening failing.
Add the ability to put error messages in the log and tombstone so
that it's clear if the log reading failed in some way.
Adjust test so that if there is a log or if no log exists, the test
will still pass.
Print an <unknown> if the command line is unreadable instead of nothing.
Test: Ran unit tests.
Test: Induced error and verified error message is save in tombstone.
Change-Id: I2fce8078573b40b9fed3cd453235f3824cadb5e3
Commit aosp/1259140 moved fdsan_table into debugger_process_info, which
is populated conditionally. This introduced a bug where the process that
receives BIONIC_SIGNAL_DEBUGGER (35) does not propagate the fdsan_table
pointer to crash_dump:
$ adb shell kill -SIG35 <pid>
$ adb logcat -s DEBUG
E DEBUG : failed to read fdsan table entry 0: I/O error
Fdsan in warn-only mode uses BIONIC_SIGNAL_DEBUGGER[1], so the generated
tombstones don't have any fd ownership info.
Fix it by calling get_process_info() irrespective of the signal being
handled, taking care to preserve the previous behavior of not showing
abort messages set by applications in non-fatal dumps.
Test: debuggerd_test
Test: send SIG35 to arbitrary process and inspect the log and tombstone
Test: crasher fdsan_file
[1] 20ad9129e7/libc/bionic/fdsan.cpp (166)
Change-Id: I76931ca4825e846fc99f26fa590c045130abb850
Also add the missing `.size` directives to all the assembler functions
for slightly improved backtraces.
Test: crasher64 pac; crasher64 bti
Change-Id: I8e0c127cbff56c33637e6ca8f1d927b971951807
When using the bootstrap linker, the get_gwp_asan_callbacks is
not set. Therefore, check it is not nullptr before calling it
during crash processing.
Bug: 284098779
Test: Ran crasher64 using /system/bin/bootstrap/linker64 and verify
Test: debuggerd code does not crash.
Test: All unit tests pass.
Change-Id: Ifc710fe4bef24661700444a1b69432bfc29d580f
The generate.sh script can generate the file, but current policy file does not match it.
And the rules are not appropriate, like missing "sysinfo", causing the
debuggerd_test to fail in system model. So we match the policy to
what it should be.
Test: make debuggerd_test
Change-Id: I57ebd7713f2ab939d01bfefcc7935e234fdd3e13
Signed-off-by: liwentao <liwentao@eswincomputing.com>
Some testing environments can have a test that is sending many
thousands of messages to the log. When this type of process crashes
all of these log messages are captured and can cause OOM errors
while creating the tombstone.
Added a test to verify the log messages are truncated. Leaving this
test disabled for now since it is inherently flaky due to having to
assume that 500 messages are in the log.
Added a test for a newline in a log message since it's somewhat
related to this change.
NOTE: The total number of messages is capped at 500, but if a message
contains multiple newlines, the total messages will exceed 500.
Counting messages this way seems to be in the spirit of the cap,
that a process logging a large message with multiple newlines does
not completely fill the tombstone log data.
Bug: 269182937
Bug: 282661754
Test: All unit tests pass.
Test: The disabled max_log_messages test passes.
Change-Id: If18e62b29f899c2c4670101b402e37762bffbec6
Just noticed some opportunities while skimming.
Test: adb shell debuggerd $(adb shell pidof com.android.systemui)
Test: All unit tests pass (both 32 bit and 64 bit).
Test: Ran unit tests in a loop hundreds of times.
Change-Id: I428d0cf599ed603a21944b084b95594db893cbd5
Make sure that all the threads have started up, otherwise the main part
of the test might not be testing as stressful a situation as expected.
Note that the "race" moniker is still valid because of the debuggerd
timeout.
The test is now faster (405ms) when run under good conditions.
Test: atest 'debuggerd_test:debuggerd_client#race'
Test: Ran debuggerd_client.race 1000 times on its own.
Test: Ran the whole suite of debuggerd unit tests 1000 times.
Change-Id: I487e7654a71df9f1799f09c6f385c929ddf2f234
Also add new unit tests to verify this behavior.
Bug: 276934420
Test: New unit tests pass.
Test: Ran new unit tests without pthread_setname_np call and verified
Test: the tests fail.
Test: Force crash logd and verify log messages are not gathered.
Test: Force crash a logd thread and verify log messages are not gathered.
Change-Id: If8effef68f629432923cdc89e57d28ef5b8b4ce2
liblog can drop data when debuggerd is overloaded, which leads to
truncated tombstones. by adding the count separately, automation can
easily see whether it is dealing with a truncated tombstone or not.
Bug: 269537146
Change-Id: Ia991537efc0d6b57cbff23ee45af6521467aa20d
This change adds a missing pointer dereference to the InterceptResponse
when checking for a size mismatch.
Test: build
Bug: N/A
Change-Id: I88afed6f1c0f33fe237d337b0fb8fc0a0c0e3bac
This adds the missing assembler for riscv64, even though I don't have a
working tombstoned yet to test it with. There's a distinct possibility
we'll be back to fix the test (because although "register 1" is harmless
for the other architectures, it's the ra register on riscv64; the default
link register), but at least this lets us build the test.
I've also simplified all the assembly to be the simplest sequence I
know that writes 0 to address 0 (because if there was a reason to use
so many instructions before, I want to know what it is so I can write
the missing comment!).
Test: treehugger
Change-Id: I10d117eaedf361d9759a450e0973d07c4f97090e
r29 is the stack pointer on mips, but it's x2 on riscv64 (and the git
history shows that this was indeed copy & pasted from the mips code)
and since bionic always sets up a signal stack with sigaltstack() I
doubt the comment was relevant even on mips (but no-one ever used it,
so who'd know?).
While I'm here, stop using decimal arithmetic --- the whole point was to
have each register contain the value that was obviously appropriate for
that register. (riscv64's mips-like mess of registers all over the place
means that's not going to be super readable, but there's no reason to
make it worse.)
Also, even though I personally prefer the 0xdead from the old mips code,
everyone else is using 0xa5a5, so let's make riscv64 match the others.
Test: treehugger
Change-Id: Ibbae821bc0a02e07164147d621e342224528c2c9
This changes the crash export mechanism in Microdroid. For this, we
create module tombstone_handler which exports methods very similar to
tombstoned.h
For Microdroid (detected using prop: ro.hardware): It calls newly
introduces microdroid specific methods to connect/notify completion of
crash.
Individual methods:
connect -> For Android, it would connect to
tombstoned which would send it the fd corresponding to newly created
file on /data/tombstone_ . For Microdroid, we connect to tombstone
server on host via vsock & populate these sockets as the output fd.
crash_dump, in the later case, would directly write on the socket(s).
notify_completion: For Microdroid, it would simply shutdown the vsock
connections.
Note when OS is not Microdroid: It calls corresponding methods of
tombstoned_client, essentially serving as a proxy.
Detailed design: go/vm-tombstone
Test: atest MicrodroidHostTests#testTombstonesAreGeneratedUponUserspaceCrash
Bug: 243494912
Change-Id: I68537b967f2ee48c1647f0f923aa79e8bcc66942
When this was translated to riscv64, someone "fixed" the crashing bugs
that were the whole point of these two functions. Fix them back so they
actually crash, and add the CFI directives.
Test: treehugger
Change-Id: I312c51fa4c893d27b0f4e39383521657a5870a0d
See aosp/2485756 for more details, but this patch introduces a new
protocol between debuggerd and ActivityManager. This new protocol allows
ActivityManager to correctly handle recoverable crashes.
Bug: 270720163
Test: atest CtsGwpAsanTestCases
Change-Id: Icac6262d608dd57a5daf51699064ab28b0c4703f
They're the same script right now, but gdbclient.py is a bit misleading,
even if we're not likely to ever actually remove it.
Test: treehugger
Change-Id: Ic514f98bf13b3e699be4dbad2bafef22d41d9ffd
1. Fixes this test under clang coverage, which is run under presubmit
for TEST_MAPPING files. When we spawn under a minijail, and the
process exited normally (which is the case for recoverable), clang
coverage would use atexit handlers to dump some stuff using banned
prctl's and other syscalls. Instead of allow-listing them all which
sounds like a huge pain, call _exit() which skips those handlers.
2. Extends the invariant testing to make sure that recoverable GWP-ASan
recovers both the first time, and a second time in a different slot.
Bug: N/A
Test: CLANG_COVERAGE=true NATIVE_COVERAGE_PATHS="*" atest debuggerd_test
Change-Id: I6059e21db4c2898b1c9777a00d2a54497d80ef79
Currently, debuggerd tells the teacher that an app that received a fatal
signal. On the playground, dobbing on a process that doesn't actually
need to be killed is considered a friendship-ending move.
Because recoverable GWP-ASan is *supposed* to not crash your app,
suppress this behaviour and don't let ActivityManager know about the
crash.
Bug: N/A
Test: Run a use-after-free in an app that's using recoverable GWP-ASan,
through the 'libc.debug.gwp_asan.recoverable.<app_name>=1' and
'libc.debug.gwp_asan.process_sampling.<app_name>=1' sysprops.
Change-Id: I033ea67d577573df10936e37db7302d4f4bc0069
Recoverable GWP-ASan is a mode landed upstream in
https://reviews.llvm.org/D140173. For more information about why/what it
is, see
https://android-review.git.corp.google.com/c/platform/bionic/+/2394588.
This patch makes debuggerd call the required libc callbacks for GWP-ASan
to recover from the memory corruption. It also adds the functionality
that libart/sigchain eventually ends up calling, which dumps a GWP-ASan
report for the first error encountered.
Test: Build the platform, run sanitizer-status in recoverable mode,
asserting that it doesn't crash but we get a debuggerd report.
Bug: 247012630
Change-Id: I27212f7250844c20a8fd1e961417cdb4e5bd3626
When moving to a proto tombstone, backtraces no longer contain
an offset when a frame is in a shared library from an apk.
Add the offset display again if needed, and add a test to
verify this behavior.
Bug: 267341682
Test: All unit tests pass.
Test: Dumped a process running through an apk to verify the offset
Test: is present.
Change-Id: Ib720ccb5bfcc8531d1e407f3d01817e8a0b9128c
An early return out of this function makes it harder to add new prints
after the memory maps.
Test: m, flash, look at tombstone
Change-Id: Id06e432918d69ac3307761b244473b6b7ab769e8
GWP-ASan changed one of the APIs upstream to now take the fault address
as well. This is to support the recoverable mode.
Add the fault address as well.
Test: gwp_asan_unittest
Bug: N/A
Change-Id: I8a4edd3fad159d91cc036050d330bbb8f9c8d435
This is a no-op but will be used in upcoming scudo changes that allow to
change the buffer size at process startup time, and as such we will no
longer be able to call __scudo_get_ring_buffer_size in debuggerd.
Bug: 263287052
Change-Id: I350421d1fcdf22ce3b8b73780b88c1e10fa8a074
The current logging...
```
F libc : Fatal signal 31 (SIGSYS), code 1 (SYS_SECCOMP) in tid 6640 (logcat), pid 6640 (logcat)
```
...isn't super useful if crash_dump then fails, because you have no idea
what syscall caused the problem.
We already include the fault address in this line for relevant cases,
so include the syscall number in this case.
Bug: http://b/262391724
Test: treehugger
Change-Id: I45ad7d99c9904bab32b65efeb19be232e59ab3a4
* Intentional crash test code with null/free/escape warnings.
Test: make tidy-system-core-debuggerd_subset
Change-Id: Ib1255c17a374729c82aa246c6a59156dbc4e1b77
The long term plan is to completely remove tombstoned from microdroid (b/243494912), however it might take time some time to implement it.
In the meantime, we've recently removed cgroups support from the microdroid kernel. This means that starting a tombstoned results in a bunch of non-fatal errors in the logs that are related to the fact that tombstoned service specifies task_profiles.
To get rid of these error messages we temporary add a microdroid variant of the tombstoned (tombstoned.microdroid) that doesn't specify task_profiles.
Bug: 239367015
Test: microdroid presubmit
Change-Id: Ia7d37ede2276790008702e48fdfaf37f4c1fd251
Fixes
```
out/soong/installs-aosp_riscv64.mk:56833: error: overriding commands for target `out/target/product/generic_riscv64/system/etc/seccomp_policy/crash_dump.riscv64.policy', previously defined at out/soong/installs-aosp_riscv64.mk:56829
```
Test: m
Change-Id: I78a1c6b10dac2da704515f33b492ff37cc086dd6
The use of __builtin_abort in CrasherTest::Trap breaks with
-ftrap-function=abort, because then the argument of Trap is no longer in
the first argument register at the time of crash.
This flag is added when *any* sanitizer is enabled on the target, even harmless
ones like memtag-heap. See sanitize.go:769.
Fix CrasherTest::Trap to be a little more reliable.
Test: debuggerd_test with SANITIZE_TARGET=memtag_heap
Change-Id: I150f1c0355bd6f2bfabfa5a7bba125acdde1120e
Signed-off-by: Liu Cunyuan <liucunyuan.lcy@linux.alibaba.com>
Signed-off-by: Mao Han <han_mao@linux.alibaba.com>
Change-Id: Ie22c2895fc30fab68eddc18713c80e403f44b203
Signed-off-by: Chen Guoyin <chenguoyin.cgy@linux.alibaba.com>
Signed-off-by: Mao Han <han_mao@linux.alibaba.com>
Change-Id: Ie58bd7cf5dde792d8fba78602b5f53471752ab24
Signed-off-by: Xia Lifang <lifang_xia@linux.alibaba.com>
Signed-off-by: Mao Han <han_mao@linux.alibaba.com>
Change-Id: I521c6da61cf2f6f67a73febf368068c430d94cdb
This uses an std::string, which causes a heap allocation, which is not
async-safe.
Test: atest --no-bazel-mode permissive_mte_test
Change-Id: I4bd53d42d9a6a659abe62a964f14c81d9ec059d0
Unify all our "noinline" variants to the current most common one, not
least because the new [[noinline]] syntax is fussier about where it goes.
Test: treehugger
Change-Id: Icfcb75c9d687f0f05c19f66ee778fd8962519436
There's a link here from the javadoc, but a link to the javadoc from
here seems like a good idea.
Test: N/A
Change-Id: I89a29f72d086d08174e72f7d0aa0421fe417f733
liblog_for_runtime_apex is a static variant of liblog which is
explicitly marked as available to the runtime APEX. Any static
dependency to liblog from inside the runtime APEX is changed from liblog
to liblog_for_runtime_apex.
Previously, to support the need for using liblog inside the runtime
APEX, the entire (i.e. both static and shared variants) liblog module
was marked as available to the runtime APEX, although in reality only
the static variant of the library was needed there. This was not only
looking dirty, but also has caused a problem like b/241259844.
To fix this, liblog is separated into two parts. (1) liblog and (2)
liblog_for_runtime_apex. (1) no longer is available to the runtime APEX
and is intended to be depended on in most cases: either from the
non-updatable platform, or from other APEXes. (2) is a static library
which is explicitly marked as available to the runtime APEX and also
visible to certain modules that are included in the runtime APEX.
Bug: 241259844
Test: m and check that liblog depends on stub library of libc
Change-Id: I10edd4487a6f090ef026acffe1ffbd067387a0d3
r.android.com/2108505 was intended to fix a crash in Scudo in
the case where the stack depot, region info or ring buffer were
unreadable. However, it also ended up introducing a number of bugs into
the code. It failed to call __scudo_get_error_info if the page at the
fault address was unreadable. This can happen in legitimate crash cases
if a primary allocation was close to the boundary of a mapped region,
or if the allocation was a secondary allocation with guard pages. It
also used long as the type for tags, whereas Scudo expects it to be
char. In combination this ended up causing most of the MTE tests to
fail. Therefore, mostly revert that change.
Fix the original crash by null checking the pointers returned by
AllocAndReadFully before proceeding with the rest of the function.
Bug: 233720136
Change-Id: I04d70d2abffaa35fe315d15d9224f9b412a9825d
Added SPDX-license-identifier-Apache-2.0 to:
debuggerd/test_permissive_mte/Android.bp
Bug: 68860345
Bug: 151177513
Bug: 151953481
Test: m all
Change-Id: Ic48cf8a972147eba8a955136be74204c013ca436
In the fallback path, if the non-main thread is the target
to be dumped, then no other threads are dumped when creating
a tombstone. Fix this and add unit tests to verify that
this all threads, including the main thread are dumped.
Bug: 234058038
Test: All unit tests pass.
Test: debuggerd -b media.swcodec process
Test: debuggerd media.swcodec process
Change-Id: Ibb75264f7b3847acdbab939a66902d986c0d0e5c