f5f04b19fe
The first time should_trace() returns true, bionic_trace_begin() calls open() on trace_marker. The problem is that open() can call bionic_trace_begin(). We've observed this happening, for example when: * fdtrack is enabled. dlopen("libfdtrack.so") can be used to enable fdtrack on a process. * ThreadA is busy unwinding inside fdtrack and is holding an fdtrack internal mutex. * ThreadB calls bionic_trace_begin() for the first time since the property "debug.atrace.tags.enableflags" contains ATRACE_TAG_BIONIC. * ThreadB calls open("/sys/kernel/tracing/trace_marker"). Since fdtrack is enabled, ThreadB tries to do unwinding as well. * ThreadB, inside fdtrack's unwinding tries to grab the same mutex that ThreadA is holding. * Mutex contention is reported using bionic_systrace, therefore bionic_trace_begin() is called again on ThreadB. * ThreadB tries to grab g_lock in bionin_systrace.cpp, but that's already held by ThreadB itself, earlier on the stack. Therefore ThreadB is stuck. I managed to reproduce the above scenario by manually pausing ThreadA inside unwinding with a debugger and letting ThreadB hitting bionic_trace_begin() for the first time. We could avoid using g_lock while calling open() (either by releasing g_lock and reacquiring it later, or by using atomics), but bionic_trace_begin() would try to call open() again. In my tests, open() does not call bionic_trace_begin() a third time, because fdtrack has reentrancy protection, but there might be another code path inside open that calls bionic_trace_begin again (it could be racy or only happen in certain configurations). This commit fixes the problem by implementing reentrancy protection in bionic_systrace. Sample callstack from ThreadA deadlocked before the fix: ``` * frame #0: 0x0000007436db077c libc.so`syscall at syscall.S:41 frame #1: 0x0000007436db0ba0 libc.so`bionic_trace_begin(char const*) [inlined] __futex(ftx=0x000000743737a548, op=<unavailable>, value=2, timeout=0x0000000000000000, bitset=-1) at bionic_futex.h:45:16 frame #2: 0x0000007436db0b8c libc.so`bionic_trace_begin(char const*) [inlined] __futex_wait_ex(ftx=0x000000743737a548, value=2) at bionic_futex.h:66:10 frame #3: 0x0000007436db0b78 libc.so`bionic_trace_begin(char const*) [inlined] Lock::lock(this=0x000000743737a548) at bionic_lock.h:67:7 frame #4: 0x0000007436db0b74 libc.so`bionic_trace_begin(char const*) [inlined] should_trace() at bionic_systrace.cpp:38:10 frame #5: 0x0000007436db0b74 libc.so`bionic_trace_begin(message="Contending for pthread mutex") at bionic_systrace.cpp:59:8 frame #6: 0x0000007436e193e4 libc.so`NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*) [inlined] NonPI::NormalMutexLock(mutex=0x0000007296cae9f0, shared=0, use_realtime_clock=false, abs_timeout_or_null=0x0000000000000000) at pthread_mutex.cpp:592:17 frame #7: 0x0000007436e193c8 libc.so`NonPI::MutexLockWithTimeout(mutex=0x0000007296cae9f0, use_realtime_clock=false, abs_timeout_or_null=0x0000000000000000) at pthread_mutex.cpp:719:16 frame #8: 0x0000007436e1912c libc.so`::pthread_mutex_lock(mutex_interface=<unavailable>) at pthread_mutex.cpp:839:12 [artificial] frame #9: 0x00000071a4e5b290 libfdtrack.so`std::__1::mutex::lock() [inlined] std::__1::__libcpp_mutex_lock(__m=<unavailable>) at __threading_support:256:10 frame #10: 0x00000071a4e5b28c libfdtrack.so`std::__1::mutex::lock(this=<unavailable>) at mutex.cpp:31:14 frame #11: 0x00000071a4e32634 libfdtrack.so`unwindstack::Elf::Step(unsigned long, unwindstack::Regs*, unwindstack::Memory*, bool*, bool*) [inlined] std::__1::lock_guard<std::__1::mutex>::lock_guard(__m=0x0000007296cae9f0) at __mutex_base:104:27 frame #12: 0x00000071a4e32618 libfdtrack.so`unwindstack::Elf::Step(this=0x0000007296cae9c0, rel_pc=66116, regs=0x0000007266ca0470, process_memory=0x0000007246caa130, finished=0x0000007ff910efb4, is_signal_frame=0x0000007ff910efb0) at Elf.cpp:206:31 frame #13: 0x00000071a4e2b3b0 libfdtrack.so`unwindstack::LocalUnwinder::Unwind(this=0x00000071a4ea1528, frame_info=<unavailable>, max_frames=34) at LocalUnwinder.cpp:102:22 frame #14: 0x00000071a4e2a3ec libfdtrack.so`fd_hook(event=<unavailable>) at fdtrack.cpp:119:18 frame #15: 0x0000007436dbf684 libc.so`::__open_2(pathname=<unavailable>, flags=<unavailable>) at open.cpp:72:10 frame #16: 0x0000007436db0a04 libc.so`bionic_trace_begin(char const*) [inlined] open(pathname=<unavailable>, flags=524289) at fcntl.h:63:12 frame #17: 0x0000007436db09f0 libc.so`bionic_trace_begin(char const*) [inlined] get_trace_marker_fd() at bionic_systrace.cpp:49:25 frame #18: 0x0000007436db09c0 libc.so`bionic_trace_begin(message="pthread_create") at bionic_systrace.cpp:63:25 ``` Bug: 213642769 Change-Id: I10d331859045cb4a8609b007f5c6cf2577ff44df
151 lines
4.4 KiB
C++
151 lines
4.4 KiB
C++
/*
|
|
* Copyright (C) 2014 The Android Open Source Project
|
|
*
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
* you may not use this file except in compliance with the License.
|
|
* You may obtain a copy of the License at
|
|
*
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
*
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
* See the License for the specific language governing permissions and
|
|
* limitations under the License.
|
|
*/
|
|
|
|
#include <errno.h>
|
|
#include <fcntl.h>
|
|
#include <stdio.h>
|
|
#include <stdlib.h>
|
|
#include <string.h>
|
|
|
|
#include "bionic/pthread_internal.h"
|
|
#include "private/bionic_lock.h"
|
|
#include "private/bionic_systrace.h"
|
|
#include "private/bionic_tls.h"
|
|
#include "private/CachedProperty.h"
|
|
|
|
#include <async_safe/log.h>
|
|
#include <cutils/trace.h> // For ATRACE_TAG_BIONIC.
|
|
|
|
#define WRITE_OFFSET 32
|
|
|
|
static Lock g_lock;
|
|
static CachedProperty g_debug_atrace_tags_enableflags("debug.atrace.tags.enableflags");
|
|
static uint64_t g_tags;
|
|
static int g_trace_marker_fd = -1;
|
|
|
|
static bool should_trace() {
|
|
g_lock.lock();
|
|
if (g_debug_atrace_tags_enableflags.DidChange()) {
|
|
g_tags = strtoull(g_debug_atrace_tags_enableflags.Get(), nullptr, 0);
|
|
}
|
|
g_lock.unlock();
|
|
return ((g_tags & ATRACE_TAG_BIONIC) != 0);
|
|
}
|
|
|
|
static int get_trace_marker_fd() {
|
|
g_lock.lock();
|
|
if (g_trace_marker_fd == -1) {
|
|
g_trace_marker_fd = open("/sys/kernel/tracing/trace_marker", O_CLOEXEC | O_WRONLY);
|
|
if (g_trace_marker_fd == -1) {
|
|
g_trace_marker_fd = open("/sys/kernel/debug/tracing/trace_marker", O_CLOEXEC | O_WRONLY);
|
|
}
|
|
}
|
|
g_lock.unlock();
|
|
return g_trace_marker_fd;
|
|
}
|
|
|
|
static void trace_begin_internal(const char* message) {
|
|
if (!should_trace()) {
|
|
return;
|
|
}
|
|
|
|
int trace_marker_fd = get_trace_marker_fd();
|
|
if (trace_marker_fd == -1) {
|
|
return;
|
|
}
|
|
|
|
// If bionic tracing has been enabled, then write the message to the
|
|
// kernel trace_marker.
|
|
int length = strlen(message);
|
|
char buf[length + WRITE_OFFSET];
|
|
size_t len = async_safe_format_buffer(buf, length + WRITE_OFFSET, "B|%d|%s", getpid(), message);
|
|
|
|
// Tracing may stop just after checking property and before writing the message.
|
|
// So the write is acceptable to fail. See b/20666100.
|
|
TEMP_FAILURE_RETRY(write(trace_marker_fd, buf, len));
|
|
}
|
|
|
|
void bionic_trace_begin(const char* message) {
|
|
// Some functions called by trace_begin_internal() can call
|
|
// bionic_trace_begin(). Prevent infinite recursion and non-recursive mutex
|
|
// deadlock by using a flag in the thread local storage.
|
|
bionic_tls& tls = __get_bionic_tls();
|
|
if (tls.bionic_systrace_disabled) {
|
|
return;
|
|
}
|
|
tls.bionic_systrace_disabled = true;
|
|
|
|
trace_begin_internal(message);
|
|
|
|
tls.bionic_systrace_disabled = false;
|
|
}
|
|
|
|
static void trace_end_internal() {
|
|
if (!should_trace()) {
|
|
return;
|
|
}
|
|
|
|
int trace_marker_fd = get_trace_marker_fd();
|
|
if (trace_marker_fd == -1) {
|
|
return;
|
|
}
|
|
|
|
// This code is intentionally "sub-optimal"; do not optimize this by inlining
|
|
// the E| string into the write.
|
|
//
|
|
// This is because if the const char* string passed to write(trace_marker) is not
|
|
// in resident memory (e.g. the page of the .rodata section that contains it has
|
|
// been paged out, or the anonymous page that contained a heap-based string is
|
|
// swapped in zram), the ftrace code will NOT page it in and instead report
|
|
// <faulted>.
|
|
//
|
|
// We "fix" this by putting the string on the stack, which is more unlikely
|
|
// to be paged out and pass the pointer to that instead.
|
|
//
|
|
// See b/197620214 for more context on this.
|
|
volatile char buf[2]{'E', '|'};
|
|
TEMP_FAILURE_RETRY(write(trace_marker_fd, const_cast<const char*>(buf), 2));
|
|
}
|
|
|
|
void bionic_trace_end() {
|
|
// Some functions called by trace_end_internal() can call
|
|
// bionic_trace_begin(). Prevent infinite recursion and non-recursive mutex
|
|
// deadlock by using a flag in the thread local storage.
|
|
bionic_tls& tls = __get_bionic_tls();
|
|
if (tls.bionic_systrace_disabled) {
|
|
return;
|
|
}
|
|
tls.bionic_systrace_disabled = true;
|
|
|
|
trace_end_internal();
|
|
|
|
tls.bionic_systrace_disabled = false;
|
|
}
|
|
|
|
ScopedTrace::ScopedTrace(const char* message) : called_end_(false) {
|
|
bionic_trace_begin(message);
|
|
}
|
|
|
|
ScopedTrace::~ScopedTrace() {
|
|
End();
|
|
}
|
|
|
|
void ScopedTrace::End() {
|
|
if (!called_end_) {
|
|
bionic_trace_end();
|
|
called_end_ = true;
|
|
}
|
|
}
|