The code doesn't properly check if data is not read properly, so
make it fail if reads fail. Also, change the algorithm so that
first try and read the faulting page then 16 pages before and 16
pages after. Rather than trying to read every one of these pages,
stop as soon as one is unreadable. This means that the total memory
passed to the scudo error function is all valid data, rather than
potentially being some uninitialized memory.
Added new unit tests to cover scudo address processing.
Bug: 233720136
Test: All unit tests pass.
Test: atest CtsIncidentHostTestCases
Change-Id: I18a97bdee9a0c44075c1c31ccd1b546d10895be9
This is not meant to be enabled long-term, but can be used to assess system
stability with MTE before enabling it.
Bug: 202037138
Change-Id: I9fb9b63ff94da2de0a814fd7150f51559d3af079
If a process requires executing fallback unwinder and the thread
crashing is not the main thread, the wrong unwinder is used.
Fix this case, and add a new unit test that causes an abort in
the non main thread.
Bug: 233721755
Test: New unit test passes with fix and fails without.
Test: Ran debuggerd on swcodec process and it still dumps all threads.
Change-Id: I70fffc5d680256ce867e7a1d427593b584259160
Merged-In: I70fffc5d680256ce867e7a1d427593b584259160
(cherry picked from commit 2d5d46ca85)
This simplifies most of the calls to avoid doing any Android
specific code.
Bug: 120606663
Test: All unit tests pass.
Change-Id: I511e637b9459a1f052a01e501b134e31d65b5fbe
With the addition of runtime-configurable GWP-ASan, there might be many,
many more than 1,000 allocations. Have support for them, but keep a
hopefully-won't-crash-the-device limit.
Bug: 219651032
Test: atest bionic-unit-tests
Change-Id: I7b8e2bf5ab7c723ab6c61365f0dc610e400dbbce
One is intentionally seeting the abort message. The other is to set
the abort message to null.
Also, make the libseccomp_policy static so that the crasher
executable can be copied to the system afterwards without
requiring libseccomp_policy.so.
Test: Ran both new crash commands on device.
Test: Ran the seccomp crash command to verify seccomp still works.
Change-Id: I255b5f37e6eb188719e5b72302ca3f5911c8d821
Inline definitions of a few constants that don't appear on Q/R devices,
so that this works for us in mainline modules that are loaded on those
older devices.
Bug: 225406881
Test: boot on Q, watch logcat
Test: boot on R, watch logcat
Change-Id: Ic5781976d4c1e2d16e230c015fc49d9fde74e289
The functionality moved from the Unwinder object to the MapInfo
object and means that the individual unreadable files can be
displayed now.
Included adding the unreadable elfs per thread in the protobuf.
Updated the unwinder test.
Test: All unit tests pass.
Change-Id: I7140bde16938736da005f926e10bbdb3dbc0f6f5
When dumping a tombstone using the fallback path, only the main
thread was showing up. Modify the code to dump the threads using
a slightly different path for the tombstone generation code.
In addition, while looking at this code, two MTE variables were
not set in the tombstone fallback code. Added those variables
so MTE devices will work properly in this fallback path.
Modified the tombstone unit tests for seccomp to have
multiple threads and verify those threads show up in the tombstone.
Bug: 208933016
Test: Ran unit tests.
Test: Ran debuggerd <PID> on a privileged process and verified
Test: all threads dumped. Also verified that the tagged_addr_ctrl
Test: variable is present on the raven device.
Change-Id: I16eadb0cc2c37a7dbc5cac16af9b5051008b5127
debuggerd_test depends on it, and the easiest way to
ensure that the file is available when running the tests
is to make it a dependency of crash_dump.
Change-Id: Iebea8e0c49d8d49d52a434e4194e870793758988
Change use of new_ to old_ to save the old sigaction data. This hasn't
caused any issues, but it's obviously wrong.
Test: Ran unit tests on coral.
Change-Id: I96be5b0980c323c3aeafb422fbc06202577604a2
Hard to get otherwise if you're trying to debug PAC issues.
Bug: http://b/214314197
Test: treehugger
Change-Id: I2e5502809f84579bf287364e59d6e7ff67770919
The frame data no longer contains map_XXX fields which represent
the map data. Now there is only a shared pointer to the MapInfo
object with which this frame is associated.
Bug: 120606663
Test: Unit tests pass.
Change-Id: I89282963f742f6fcc07e48533da4108dc16bdce9
It is expensive to keep the non-protobuf path around and it hasn't
been used for an entire release without anyone noticing, so remove it.
Create new end-to-end unit tests that cover tests of the non-proto
code paths that are being deleted.
Bug: 197981919
Test: Unit tests pass.
Change-Id: Ia1c45572300bd63e5f196ad61e5e5386830c8ece
- Use "likelihood" instead of "probability" since that has connotations
of being less precise, and our probability ordering isn't very precise
anyway.
- Hide the fault address with SEGV_MTEAERR because it is not available.
- Pad the fault address with leading zeroes to make it clearer which
bits of the top byte (and any following bytes such as PAC signature
bits) are set.
Bug: 206015287
Change-Id: I5e1e99b7f3e967c44781d8550bbd7158eb421b64
On the main thread, the siginfo pointer will never be nullptr.
Add a CHECK to make sure this is true.
Test: Unit tests pass both 32 bit and 64 bit.
Test: Ran with debug.debuggerd.translate_proto_to_text set to 0
Test: to exercise old path.
Change-Id: I9d5ed0de5d652de8a4f9cd85eb57cbb1ec676404
This code was added, but a svelte config still tries to use scudo
related code that doesn't exist.
Bug: 201007100
Test: Ran unit tests on normal config.
Test: Ran unit tests on svelte config.
Change-Id: Ic84bae37717d213121aef182bac2f82dbee25213
strerror is nice, but usually I don't care about the text, I care about
the uppercase enum
Bug: N/A
Test: N/A
Change-Id: I8ea86220cb04cbded701379c47b8aba8ea8864b8
I was here because we have a case where timeout(1) kills logcat, but
debuggerd alleges that the process that was killed had started less than
a second ago. I'm not sure this is the problem there, but I did notice
that far too many tombstones were claiming improbably short process
uptimes. It turns out that the code was measuring the *thread* uptime,
not the *process* uptime.
Also simplify the code a bit by switching to sysinfo(2) rather than
reading a file.
Test: manual, plus the existing unit test
Change-Id: Ie2810b1d5777ad9182be92bfb3f60795dc978b24
The libunwindstack code will attempt to dlopen the libdexfile.so
when a dex pc is found. Unfortunately, this failed since that
library was not properly listed as a runtime library. To make
sure this doesn't happen again, add an end to end test that
will create a dex pc frame, and will verify the correct
dex function name is in that frame.
Bug: 199043576
Test: Unit test passes on arm/aarch64/x86/x86_64.
Test: Removed the runtime_libs of libdexfile from libunwindstack
Test: and verified the new test fails.
Change-Id: I3a11f9ee44e06e37a547d193b04f7fbb90ccfe0a
The output.text.fd value is only ever -1 when there is a failure.
There is no need to check both < 0 or -1, so only check for -1.
Test: Unit tests pass.
Test: Verified the message is seen on intercept and not on
Test: regular crashes.
Change-Id: I1eddcd5d2342b268ceb261b246c98b10cee85bb4
Revert "Allow visibility on libdexfile for all libdexfile_suppor..."
Revert "Add libdexfile runtime dependency of libdexfile_support."
Revert "Add libdexfile runtime dependency of libdexfile_support."
Revert submission 1810760-libdexfile-runtime-2
Reason for revert: DroidMonitor: Potential culprit for Bug 198352910 - verifying through Forrest before revert submission. This is part of the standard investigation process, and does not mean your CL will be reverted.
Reverted Changes:
If4da968e4:Add libdexfile runtime dependency of libdexfile_su...
I80162942a:Allow visibility on libdexfile for all libdexfile_...
Iab18abc8e:Add libdexfile runtime dependency of libdexfile_su...
I473d146d8:Add libdexfile runtime dependency of libdexfile_su...
Change-Id: Iacab8e0a5c74e0c3185a155e35c28903aa9acb4a
When the switch was made to dump the tombstone from the protobuf,
the fault address marker in the maps section went missing. Re-add
that logic and add new unit tests to verify all of the different
behaviors.
Bug: 193935960
Test: All unit tests pass.
Test: All unit tests pass when setprop debug.debuggerd.translate_proto_to_text 0
Test: The above on cuttlefish, 32 bit and 64 bit.
Test: The above on a flame, 32 bit and 64 bit.
Change-Id: I098bb6ab4bacacae2ca0fc5ec9a73549ed0b9489
The "missing output fd" message can seem like an error, so modify
the message to indicate what is really happening. This message
will occur normally when running the debuggerd command, or when
a bugreport is generated, or when an ANR occurs. In all of those
cases, this is not an error, but an expected action.
Bug: 196189981
Test: Ran debuggerd -b and debuggerd and verified this message is seen.
Test: Ran unit tests.
Change-Id: I6e3d5a76d92b972c77fca301ea7147745bc67c37
The tombstone will add a newline after the abort message, so remove
any trailing newlines before saving/printing.
Bug: 196414062
Test: Unit tests pass.
Test: Set system property debug.debuggerd.translate_proto_to_text to 0
test: and unit tests still pass.
Change-Id: I0d3dc215eb5d8be93d99e5b9d4f0a14b1d61396d
A lot of things had moved out of system/core/ without their TEST_MAPPING
entries having gone with them, reducing the amount of presubmit coverage
for those things.
In order to reduce the likelihood of that happening again, I've pushed
all that remained in the system/core/ TEST_MAPPING down into the
individual subdirectories.
Test: treehugger
Change-Id: Ib75d65f9200fa64ae1552471da6fbe5b7023cf94
After compiler update, infinite side effect free loops are replaced with trap
instructions. So use -fno-finite-loop to disable this behavior.
Bug: 196162833
Test: run debuggerd_test.
Change-Id: I057263360a5df64af18c17a025fab48887d0b470
When running debuggerd from the command line, it's possible that
the signal will happen on a side thread. The original intercept
in tombstoned is set to only handle crashes from the main thread
pid, so in this case, the intercept doesn't occur. To fix this,
modify the code so that running debuggerd always sends the signal
to the main pid. In addition, modify the signal handler is entered
due to the BIONIC_SIGNAL_DEBUGGER signal, then the crashing tid is
set to the main thread pid instead of the current thread.
Add unit test to cover this case.
Bug: 194346289
Test: All unit tests pass.
Test: Verify the new unit test is getting the signal on the non-main
Test: thread and still properly handling the intercept.
Test: Modify the debuggerd code to send the signal to the non main pid
Test: and verify the dump still occurs correctly.
Change-Id: I2dd1bd11fc8ef4a6fe87f05ecc67ae349a101c82
For the new kernel 5.13 heders, there is a new TRAP_PERF value that
needs to be handled.
Test: Builds.
Change-Id: I2c6658ca94423c210db9ad6692ec69f6be69b3f5
Dumping stack in ANR can fail, but error message is only printed
to logcat. To allow easier debugging of such cases we add the
error messages in the ANR file as well.
Also factor out some duplication, inline single-call functions to
their call sites, and make some of the lambdas clearer by only having
implicit state unrelated to their primary purpose in captures but
passing as arguments things they fundamentally need to do their job
(and actually adding some duplication for time_left() which was subtle
enough to fool me into thinking that we only needed one call of
set_timeout(), which I've renamed to update_timeout()).
Bug: 191172191
Test: Manual
Change-Id: I39a50ca5b72059bfeff48b010d3be44f19eb32fa
writepid command usage to join a cgroup has been deprecated in favor
of a more flexible approach using task_profiles. This way cgroup path
is not hardcoded and cgroup changes can be easily made. Replace
writepid with task_profiles command to migrate between cgroups.
Bug: 191283136
Test: build and boot
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I945c634dfa7621437d8ea3981bce370d680b7371
Using thread cache will cause SIGSEGV for 32bit+kernel4.9 device.
Bug: 190579082
Bug: 189803009
Test: run cts -m CtsSeccompHostTestCases
Change-Id: I47b13d02674aadbacd8dac36d8382eed0885413c
Signed-off-by: yidong zhang <yidong.zhang@amlogic.com>
I'm guessing that the original
F crash_dump64: crash_dump.cpp:460] failed to attach to thread 1671, already traced by 0 ()
was probably a race, where there _was_ a tracer but they disappeared?
Whatever, it doesn't seem helpful to show "already traced by nobody",
and we also don't want to clobber errno in the fallthrough case
(previously just where get_tracer() failed, but now also where
get_tracer() returns "nobody").
Bug: http://b/188668580
Test: treehugger
Change-Id: I3fa3b4f7e32531d48dfbb0ef946ff351ed5d9171
All but three files are Apache-2.0 already.
Bug: http://b/191499510
Test: /google/src/files/head/depot/google3/wireless/android/busytown/ayeaye/analyzers/copyright/tools/scan_android_project.sh ~/aosp/system/core/debuggerd/ | grep -v APACHE
Change-Id: I430c3382dd160e398f02470d7053ecea39c98f41
The code in the fallback path calls pthread_key_create when using the
normal thread cache. However, this code is executed out of the linker,
which means that the call doesn't see keys created by the libc version
of pthread_key_create. As of now, simply avoid using the thread cache
to avoid this problem.
Bug: 189803009
Test: debuggerd -b on a media process on a 32 bit Android Go device
Test: and observe no crash.
Test: debuggerd unit tests pass.
Change-Id: I9ca1a55e44d3bb69d49450826d7d64d7a64145c3
(cherry picked from commit 49e5a76544)
This information clearly meets the bar for being dumped to logcat. If we
omit the info, we may confuse the user into thinking that it's not
available at all, especially if it's their first time seeing an MTE
report.
This also adds some functionality to the integration testing library to
pull logcat messages and scan them to make sure the contents are in both
places.
Bug: 187881237
Test: atest debuggerd_test # on QEMU w/ MTE.
Change-Id: Icc17ea45bda7628331cc4812eaad3bc5c949b7a7
This type of error is unlikely and attempting to detect it with MTE
is likely to produce false positive reports. Make sure that this type
of error is not detected by the allocator.
Change-Id: I90676d1a031411d6b725890311317802bc24b459
This does not currently cause any problems but it does block progress
on the referenced bugs.
Bug: 187910671
Bug: 187914588
Test: m droid
Change-Id: I977cd842101187441ddbc873eac25598295aab06
When moving to the proto-ized tombstones, the note about unreadable
elf files in a backtrace got lost. This re-adds it and adds a test
to verify that the note properly shows up.
Bug: 185428454
Test: Ran unit tests.
Change-Id: I1150cc737772e1b79fd73ec5c782caadc4629421
A change was made so that pthread_create is calling
prctl(PR_PAC_RESET_KEYS, ...) on aarch64. It's possible that other
seccomp policies might need to change to allow this.
Test: CrasherTest.seccomp_backtrace passes on aarch64.
Change-Id: I9c4d1b3dca5f19a6285bf904bb942f1f52e42bd0
Talk of "gdb" when we currently mean "gdb or lldb" and will soon mean
"lldb" is starting to confuse people. Let's use the more neutral
"debugger" in places where it really doesn't matter.
The switch from gdbclient.py to lldbclient.py is a change for another
day...
Test: treehugger
Change-Id: If39ca7e1cdf4c8bb9475f1791cdaf201fbea50e0
Bug: http://b/181927912
Clang already has -Wfree-nonheap-object but it became a default warning
with clang-r416183
Test: compile crasher.cpp
Change-Id: Ice532e9f373a628e07acd08a4fc7bfa7cf5d4e08
Proto tombstones were missing tagged fault addresses, tagged_addr_ctrl,
tags in memory dumps and Scudo and GWP-ASan error reports. Since text
tombstones now go via protos, all of these features broke when we
switched to text tombstones generated from protos by default. Fix
the features by adding support for them to the proto format,
tombstone_proto and tombstone_proto_to_text.
Bug: 135772972
Bug: 182489365
Change-Id: I3ca854546c38755b1f6410a1f6198a44d25ed1c5
Looks like we unintentionally had a breakage after aosp/1595302, where
both GWP-ASan and MTE tests started failing because the extra
information wasn't plumbed through the tombstones. MTE has end-to-end
tests but aren't run continuously, and GWP-ASan was missing the e2e
tests.
Also remove some unique wording for GWP-ASan, a UaF on the free'd
pointer is now "0 bytes into a 16-byte allocation" instead of "on a
16-byte allocation". The former is more descriptive and is more
ubiquitously used in our tooling.
This patch adds the E2E tests, but the underlying problem needs to be
fixed as well, before this patch can land.
Bug: 182489365
Test: atest debuggerd_test
Change-Id: I0fe8aba7ea443b3071724987f46b19a6525cda3c
In order to test the platform in emulators that are orders of magnitude
slower than real hardware we need to be able to avoid hitting timeouts
that prevent it from coming up properly. For this purpose introduce
a system property, ro.hw_timeout_multiplier, which may be set to
an integer value that acts as a multiplier for various timeouts on
the system.
Bug: 178231152
Change-Id: I6d7710beed0c4c5b1720e74e7abe3a586778c678
Merged-In: I6d7710beed0c4c5b1720e74e7abe3a586778c678
Application developers would like to know how long their process has
been alive for to distinguish between crashes that happen immediately
upon startup and crashes in regular operation.
Test: manual
Change-Id: Ia31eeadfcced358b478c7a7c7bb2e8a0252e30f4
We're running into timeouts from death tests because we're ~doubling the
cost of crash dumping by doing it twice.
Bug: http://b/180605583
Test: treehugger
Change-Id: If5b40434171323a09960b70af0124ec08bd3fbe8
On cuttlefish, the number of tombstones allowed is much larger
than 50, so change the algorithm to search for any tombstone
file.
Test: Ran unit tests on cuttlefish with > 50 tombstones.
Test: Ran unit tests on device.
Change-Id: Ia1d885fe19a7f7751fe3386d40b48750d1e21bd5
With this change we can report memory errors involving secondary
allocations. Update the existing crasher tests to also test
UAF/overflow/underflow on allocations with sizes sufficient to trigger
the secondary allocator.
Bug: 135772972
Change-Id: Ic8925c1f18621a8f272e26d5630e5d11d6d34d38
We were already doing this for the text tombstones but not for protos,
which meant that we stopped producing protos once we hit the limit
on the number of tombstones. Move the code for the text tombstones
into a common location and call it for both types.
Change-Id: I4951150da51a32d50821d147458fc5c18200c9d4
Otherwise we can fail to find map entries for tagged addresses,
such as those of heap objects.
Bug: 135772972
Change-Id: Ia626b0587c8461eb575b2de5c08562c73ba4a66e
Now that we default to sync MTE in tests, the default tagged_addr_ctrl
in this test needs to be updated.
Bug: 135772972
Change-Id: I9bf6fb29df9799d1ed8c0d8b66f4d2891f487d80
There's no way to atomically unlink a specific file for which we have an fd from
a path, which means that we can't safely delete a tombstone without coordination
with tombstoned, which is risky. For example, if we use flock on the directory,
and system_server crashes while holding the lock, we risk deadlock.
We do the next best thing, and keep a file descriptor around for every
tombstone, and truncate it, which requires system_server to be able to
write to tombstones (which are owned by the system group).
Test: treehugger
Change-Id: I6ba7f1fe87ee1a4b57bdb3741e8ec9fbc80788c9
Respect ro.timeout_multiplier property. Some of these are required for
tombstone writing to work on MTE QEMU, the rest are done speculatively.
Test: add crashing code to system_server, observe the tombstone
Bug: 178231152
Change-Id: Ic86e494af571301df7af07d13a6c046a0da6bda7
libbase logging uses getprogname() to get the default tag, which breaks
for the fallback handler which is statically linked into the dynamic
linker. Switch to libasync_safe for logging.
Test: atest -c CtsSeccompHostTestCases:android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls
Change-Id: Ieeaf33fb26cff4ba7e1589d1d883ac2fcc74cf47
Revert "Let crash_dump read /proc/$PID."
Revert submission 1556807-tombstone_proto
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Reverted Changes:
Ide6811297:tombstoned: switch from goto to RAII.
I8d285c4b4:tombstoned: make it easier to add more types of ou...
Id0f0fa285:tombstoned: support for protobuf fds.
I6be6082ab:Let crash_dump read /proc/$PID.
Id812ca390:Make protobuf vendor_ramdisk_available.
Ieeece6e6d:libdebuggerd: add protobuf implementation.
Change-Id: I8a77f6b9e1b42902ef7ee250cc3f1fd341ea0e2b
Revert "Let crash_dump read /proc/$PID."
Revert submission 1556807-tombstone_proto
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Reverted Changes:
Ide6811297:tombstoned: switch from goto to RAII.
I8d285c4b4:tombstoned: make it easier to add more types of ou...
Id0f0fa285:tombstoned: support for protobuf fds.
I6be6082ab:Let crash_dump read /proc/$PID.
Id812ca390:Make protobuf vendor_ramdisk_available.
Ieeece6e6d:libdebuggerd: add protobuf implementation.
Change-Id: Ib2403c1b61f6cf0513b76361440fbc5909d7554a
Revert "Let crash_dump read /proc/$PID."
Revert submission 1556807-tombstone_proto
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Reverted Changes:
Ide6811297:tombstoned: switch from goto to RAII.
I8d285c4b4:tombstoned: make it easier to add more types of ou...
Id0f0fa285:tombstoned: support for protobuf fds.
I6be6082ab:Let crash_dump read /proc/$PID.
Id812ca390:Make protobuf vendor_ramdisk_available.
Ieeece6e6d:libdebuggerd: add protobuf implementation.
Change-Id: I0c4f3a17e8b06d6c65255388c571ebf11d371dbb
Revert "Let crash_dump read /proc/$PID."
Revert submission 1556807-tombstone_proto
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Reverted Changes:
Ide6811297:tombstoned: switch from goto to RAII.
I8d285c4b4:tombstoned: make it easier to add more types of ou...
Id0f0fa285:tombstoned: support for protobuf fds.
I6be6082ab:Let crash_dump read /proc/$PID.
Id812ca390:Make protobuf vendor_ramdisk_available.
Ieeece6e6d:libdebuggerd: add protobuf implementation.
Change-Id: Ia0a1ee57e7630e01c495dc166218f665340aad7f
This reverts commit 675cb30f05.
Reason for revert: b/178455196, Broken test: android.seccomp.cts.SeccompHostJUnit4DeviceTest#testAppZygoteSyscalls on git_master on cf_x86_64_phone-userdebug
Change-Id: I82d228f2bc3e6b426d4703732e1c8766815ccc97
* changes:
libdebuggerd: add protobuf implementation.
tombstoned: support for protobuf fds.
tombstoned: make it easier to add more types of outputs.
tombstoned: switch from goto to RAII.
Currently, all MTE failures end up displaying 'Fault address falls at
0x<addr> after any mapped regions'. Clearly when scanning, we should use
the untagged address to figure out which ranges it's in.
I've taken the liberty of removing all si_addr parsing and moving it
into the common ProcessInfo, as well as making it really explicit
whether you want the (possibly tagged) original si_addr, or whether you
want the untagged variant (for scanning /proc/maps or whatever).
This is not particularly easily testable, as ReadCrashInfo isn't easily
injectable and `dump_all_maps` should already be passed the untagged
pointer to scan for. I've tested this locally on FVP under SYNC MTE with
a simple UaF binary and noted the problem is fixed. Given that this is
making the code more clear, I'm hoping the owners see no need for a
regression test :).
Bug: 135772972
Test: On FVP, run 'adb shell MEMTAG_OPTIONS=sync sanitizer-status' and
check that the use-after-free test ends up with the /proc/maps
desription in the right place.
Change-Id: I220e4200c75a72474a95a67e5bbc36173a438dd2
This commit implements protobuf output for tombstones, along with a
translator that should emit bytewise identical output to the existing
tombstone dumping code, except for ancillary data from GWP-ASan and
Scudo, which haven't been implemented yet.
Test: setprop debug.debuggerd.translate.translate_proto_to_text 1 &&
/data/nativetest64/debuggerd_test/debuggerd_test
Test: for TOMBSTONE in /data/tombstones/tombstone_??; do
pbtombstone $TOMBSTONE.pb | diff $TOMBSTONE -
done
Change-Id: Ieeece6e6d1c26eb608b00ec24e2e725e161c8c92
Sadly, it looks like we do still really use libcutils for some of the
socket functions.
Test: treehugger
Change-Id: Ic71f97507c89b10d2f3b7a2971064a9e6b1d349d
Now that the feature guarded by this flag has landed in Linux 5.10
we no longer need the flag, so we can remove it.
Bug: 135772972
Change-Id: I02fa50848cbd0486c23c8a229bb8f1ab5dd5a56f
- Make it apply to every thread, and thus remove the restriction
that it must be called while the program is single threaded.
- Make it change TCF0 itself (on all threads), instead of requiring
callers to do it themselves, which can be error prone.
And update all of the call sites.
Change the implementation of
android_mallopt(M_DISABLE_MEMORY_MITIGATIONS) to call
android_mallopt(M_SET_HEAP_TAGGING_LEVEL) internally. This avoids
crashes during startup that were observed when the two mallopts
updated TCF0 unaware of each other.
I wouldn't expect there to be any out-of-tree callers at this point,
but it's worth noting that the new interface is backwards compatible
with the old one because it strictly expands the set of situations in
which the API can be used (i.e. situations where there are multiple
threads running or where TCF0 hadn't been updated beforehand).
Bug: 135772972
Change-Id: I7746707898ff31ef2e0af01c4f55ba90b72bef51
The discussion on LKML is converging on v16 of the fault address tag
bits patch [1]. In this version of the patch the presence of the tag
bits in si_addr is controlled by a sa_flags bit, and a protocol is
introduced to allow userspace to detect kernel support for sa_flags
bits. Update the tombstone signal handler to use this API to read
the tag bits, update the interceptors in libsigchain to implement
the flag support detection protocol and hide the tag bits in si_addr
from chained signal handlers that did not request them to match the
kernel behavior.
[1] https://lore.kernel.org/linux-arm-kernel/cover.1605235762.git.pcc@google.com/
Change-Id: I57f24c07c01ceb3e5b81cfc15edf559ef7dfc740
It turns out that I had originally written the test with a local
patch applied that forces TCF0 to SYNC, so it was testing for the
wrong tagged_addr_ctrl value. Fix it.
Bug: 135772972
Change-Id: Ibb9b25e5f5635372ad5de7825c31d7264ff02590
This simplifies some of the logic and removes the need to pass an
Arch value to functions that should already know about the arch
it is operating on.
Includes fixes for debuggerd/libbacktrace.
Added new unit tests to cover new cases.
Test: All unit tests pass.
Test: Faked unwinder failing to verify debuggerd error messages display
Test: properly in backtrace and tombstone.
Change-Id: I439fcae0695befcfb1cb4c0a786cc74949d33425
This value indicates whether memory tagging is enabled on a thread,
the mode (sync or async) and the set of excluded tags. This information
can sometimes be important for understanding an MTE related crash,
so include it in the per-thread tombstone output.
Bug: 135772972
Change-Id: I25a16e10ac7fbb2b1ab2a961a5279f787039000b
Until 77fdb22cf6, logd started as
AID_ROOT and then dropped its privileges. Since then, there's been no
reason to use string comparisons rather than checking the uid.
Test: pkill -SEGV logd
Test: treehugger
Change-Id: Ia709f8f59cb0ab9abac7df84c96c701b5d0a83ea
An OEM asks for sub-second granularity, and that's most easily done if
we only have one timestamp generator. I'm not convinced sub-second
granularity is particularly useful myself, and I definitely don't think
that nanosecond resolution is meaningful but I do like this cleanup, and
if I'm going to use sub-second precision I may as well use the maximum
precision available to me.
Also reduce some duplication of code reading cmdline/comm.
Bug: https://issuetracker.google.com/161860597
Test: head /data/tombstones/*
Change-Id: I035ecfd4a3338ccd84dae0ef973a998a7c7c5056
Tags appear in the addresses printed in the memory dump, which seems
like a reasonable place to put them because tagged addresses will
also appear in other places in the tombstone, such as registers and
the fault address.
Bug: 135772972
Change-Id: I52da338347ff6b7503cf5ac80763c540695dc061
Previously, we would do a simple bounds check before deciding
whether to dump the memory around a register. On 64-bit platforms,
the register's value was required to be less than (4 << 60). However,
after stripping tags on AArch64 as part of r.android.com/1365229, all
pointer values became less than (4 << 60), so the check became useless
for filtering out invalid pointers. As a result, we would attempt to
dump memory for all registers, which for a register not containing
a valid pointer would typically consist of 16 lines of dashes.
One possible fix may be to replace the constant (4 << 60) with the
process's actual address space limit (known as TASK_SIZE inside the
kernel; typically 39 bits on AArch64 and 48 bits on x86_64), but the
kernel provides no API for retrieving a process's TASK_SIZE value. We
could guess it by looking at for example the highest bit set in the
value of getauxval(AT_EXECFN), which points to an address on the stack
which typically is mapped at the end of the address space on program
startup, but at least on AArch64 it is possible to dynamically extend
TASK_SIZE at runtime by providing a hint to mmap(), so this is not
always sufficient.
Instead, it seems best to remove most of the early bounds check, and
simply issue ptrace() calls for each register value, bailing out of
the entire output if none of the calls ended up succeeding. This also
has the nice side effect of avoiding 16 lines of noise per register
whose value looks like a pointer but actually points to unmapped
memory. We still retain part of the bounds check in order to avoid
integer overflow during the dump (including overflows into the tag
part of the address on architectures that support tagging).
Bug: 154272452
Change-Id: I94e4b7124b7735b92fd83a49c80ebded3483cd4e
We do not install a 32-bit version of libminijail to 64/32 devices,
which means that "atest -a debuggerd_test" always fails on 32-bit.
Fix it by statically linking libminijail.
Change-Id: I1e5610d1353b4f5b718c1259825421c0c07d7c24
After r.android.com/1288984 we started failing to dump memory contents
for heap addresses because the tag started causing any addresses to
fail this bounds check. Add an untag_address() call to the bounds check
so that the tag is ignored.
Bug: 154272452
Change-Id: I3a6d1a078b21871bd93164150a123549f83289f6
It's impractical to test the contents of the stack trace, but we
should at least test that *a* stack trace is present, which would
have caught the bug fixed by r.android.com/1306754 .
Bug: 135772972
Change-Id: Ic5e0b997caa53c7eeec4e5185df5c043c9d4fe3d
Teach debuggerd to use the new scudo APIs proposed in
https://reviews.llvm.org/D77283 for extracing MTE error reports from crashed
processes, and include those reports in tombstones if possible.
Bug: 135772972
Change-Id: I082dfd0ac9d781cfed2b8c34cc73562614bb0dbb
log/log.h primarily concerns itself with writing logs. The few users
who read logs should directly include log/log_read.h.
Bug: 78370064
Test: build
Change-Id: Ie95c55ea2ffc76fc95768323d445ada6ad4f2520
If crash_dump dies before it gets a chance to write to the pipe we use
to let the debugged-process know that it successfully started, we
weren't cleaning up the child we fork to start it, leaving a zombie
child.
Bug: http://b/152119184
Test: debuggerd_test
Change-Id: Id01cc05f693995e9998941774f74ab8e3d8b4d8a
On aarch64, the top 8 bits of the address (i.e. the tag bits) of
the fault address in si_addr are always clear. This isn't ideal for
MTE which will require these bits in order to correctly diagnose
tag mismatches.
A proposed kernel patch [1] exposes the full fault address including
the tag bits as part of the ucontext. Change debuggerd to read this
fault address if available.
[1] https://patchwork.kernel.org/patch/11435077/
Bug: 135772972
Change-Id: Ia05be574113860f4e9ecc36a310c4b740e0c4afb
GWP-ASan uses frame-pointer based unwinding internally on
allocation/deallocation to collect stack traces that are used when
crashes are reported.
This should be generic, so pull it out into libunwindstack so it can be
used by MTE as well.
Bug: 152412331
Test: atest debuggerd_test
Change-Id: I27b32263aac63446f5fe398af108676b70cd3971
Similar to r.android.com/1247247 I'll be adding more of them for MTE.
Also, change the protocol between the crasher and crash_dump to make
it easier to add new fields and change the referenced data structures
without needing to worry about versioning. The version number for
static executables is now always 1 (where the protocol will never
change), while the version number for dynamic executables is always
4 (where the protocol can change, because the linker and crash_dump
are version locked).
Bug: 135772972
Change-Id: Ib4696d0544d7c87cb429aaaa15f18c3640059e16
We're now using it in contexts that don't have all of the registers available,
such as GWP-ASan and soon MTE, so it doesn't make sense to have it be a
member function of Regs.
Bug: 135772972
Change-Id: I18b104ea0adb78588d7e475d0624cefc701ba52c
- Create a static library libunwindstack_no_dex without DEX support.
- Use it in libdebuggerd_handler_fallback, whose only use is in the
linker, which shouldn't need that support.
- Use it in init_first_stage, which doesn't need DEX support either.
- Also need a libbacktrace_no_dex since it's in the dependency chain
from init_first_stage to libunwindstack_no_dex.
Also restrict the *_no_dex libs and libdebuggerd_handler_fallback as
much as possible to avoid inadvertent use of these reduced
functionality libs.
Test: m init_first_stage on Cuttlefish
where BOARD_BUILD_SYSTEM_ROOT_IMAGE=false
Test: m system_image com.android.runtime
Test: Build & boot
Test: atest linker-unit-tests libunwindstack_unit_test debuggerd_test
Bug: 142944931
Bug: 151466650
Change-Id: Iaacb29bfe602f3ca12a00a712e2a64c45ff0118b
A future change will introduce a version lock between linker and
crash_dump. Move crash_dump into the runtime APEX alongside linker in order to
ensure that they will be the same version even if the runtime APEX is updated.
Bug: 135772972
Change-Id: Ic2eae31b6927eb0e8a62315ac141f50933c00bcc
Merged-In: Ic2eae31b6927eb0e8a62315ac141f50933c00bcc
We're now passing around a couple of addresses for GWP-ASan in addition
to abort_msg_address and fdsan_table_address, and I'm going to need to add
more of them for MTE. Move them into a data structure in order to simplify
various function signatures.
Bug: 135772972
Change-Id: Ie01e1bd93a9ab64f21865f56574696825a6a125f
On userdebug/eng devices, check a system property to see whether we
should create tombstones or not. OEMs that would rather have core dumps
can set this property and configure /proc/sys/kernel/core_pattern
appropriately.
Bug: https://issuetracker.google.com/149663286
Test: set the property, cause a crash
Change-Id: If894b4582a1820b64bdae819cec593b7710cb6e3
GWP-ASan can provide information about a crash that it caused. Grab the
GWP-ASan regions from the globals shared by the linker for crash-handler
purpopses, pull the information from GWP-ASan, and display it.
This adds two regions:
1. Causality tracking by GWP-ASan. We now print a cause header about
the crash, like `Cause: [GWP-ASan]: Use After Free on a 1-byte
allocation at 0x7365bb3ff8`
2. Allocation and deallocation stack traces.
Bug: 135634846
Test: atest debuggerd_test
Change-Id: Id28d5400c9a9a053fcde83a4788f971e677d4643
use Android.bp instead of Android.mk to build and install the
crash_dump.policy files. This also allows mainline modules to pull
the files into their apex (dependency wasn't handled for Android.mk)
Bug: 147914640
Test: build, examine generated filesystem
Change-Id: Iae92d4f9d683ccfddf1716e7eb2877b7bff0c737
This takes a lot of space, isn't convincingly useful, and makes it
likely that the far more valuable stuff that comes after it gets
truncated. So let's just drop it.
Bug: http://b/139860930
Test: manual crasher, presubmit
Change-Id: Ie417ffc07e3cb17e95fdb3d183f8c87de0f34b89
1 page isn't enough to log on AArch64, and clean pages are free, so
increase the stack size to 8 pages.
Bug: http://b/144887737
Test: treehugger
Change-Id: I731b3bc27ab37f4b830a9478a04cd34d4f7648d3
GWP-ASan's crash information retrieval services requires a Printf()
function (declared by the system/implementing allocator). In this
instance, because _LOG is called with additional arguments (the log_t),
this function must be wrapped to conform to printf_t defined by
GWP-ASan.
We can easily wrap the variadic version.
Bug: 135634846
Test: atest debuggerd_test
Change-Id: I17209cd2b7455ce889e2f8194969f606cac329eb
A thread's PSTATE can sometimes be critical for understanding a crash,
especially with MTE and other new features that store per-thread state
in PSTATE.
Bug: 135772972
Change-Id: I1bee25bffe7eea395f04b6449dc9227298cf866e
logger_entry and logger_entry_v2 were used for the kernel logger,
which we have long since deprecated. logger_entry_v3 is the same as
logger_entry_v4 without a uid field, so it is trivially removable,
especially since we're now always providing uids in log messages.
liblog and logd already get updated in sync with each other, so we
have no reason for backwards compatibility with their format.
Test: build, unit tests
Change-Id: I27c90609f28c8d826e5614fdb3fe59bde22b5042
debuggerd_client.race seems to have suddenly started to flake, for no
apparent reason. This doesn't seem to reproduce locally, so increase
the timeouts to rule out our test VMs being slow.
Bug: http://b/142571257
Test: treehugger
Change-Id: Ic54a78b8da36cb1163cec7e7976c73c3da628a30
C++20 wants members to be ordered unlike C99.
Bug: 139945549
Test: mm
Change-Id: I3cbca589511c1e0bbc10c691949e18de77e16031
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
We're missing useful crashes, especially on hwasan builds.
Bug: http://b/140580637
Test: run crasher
Change-Id: Ib5d8d3bd3fc4d7fec77d0b10302e5595f97a3515
There is still some flakiness, so increase the timeout values.
Also remove the TEMP_FAILURE_RETRY macro usage in TIMEOUT calls.
That macro disables the ability of the alarm code to interrupt
the system call.
Bug: 141045754
Test: Unit tests pass.
Change-Id: Ia3c95dccc3076a3fd5ef6432097a57e4ccee4df3
The fdsan code uses getrlimit/ugetrlimit so need to allow that when
running the debuggerd unit tests.
Bug: 141045754
Test: Ran the offending tests hundreds of times without failure.
Change-Id: Iece94f03e7895d61ca8a8f3ab17dce7e54ddf9cd
Catch as many early-boot crashes as we can by starting tombstoned
immediately after /data is mounted.
Bug: http://b/139864948
Test: adb shell su 0 dmesg | grep "starting service"
Change-Id: I7f8821102191a445e87020f3efa59a2e0620d9db
Since only privileged processes with CAP_SYS_ADMIN can read kernel
stack traces from /proc/*/stack, we dump the waiting channels
instead to provide some insight as to where the process might
be stuck in the kernel.
Bug: 135458700
Fixes: 135458700
Test: adb shell am hang; Check /data/anr/<anr-file> for
wchan data.
Change-Id: I9f13511ad89a259ce5e5465155db15d45d2c46d8