Several previous changes conspired to make a mess of the thread list
in static binaries. This was most obvious when trying to call
pthread_key_delete(3) on the main thread.
Bug: http://code.google.com/p/android/issues/detail?id=36893
Change-Id: I2a2f553114d8fb40533c481252b410c10656da2e
Save thread id to *thread_out before new
thread is allowed to run else there's a
risk that the thread has finished and
been deleted when *thread_out is assigned.
Change-Id: I6b84c61a8df06840877d4ab036f26feace3192d8
A call to pthread_key_delete() after pthread_exit() have unmapped the stack of a thread
but before the ongoing pthread_join() have finished executing will result in an access
to unmapped memory.
Avoid this by invalidating the stack_base and tls pointers during pthread_exit().
This is based on the investigation and proprosed solution by
Srinavasa Nagaraju <srinavasa.x.nagaraju@sonyericsson.com>
Change-Id: I145fb5d57930e91b00f1609d7b2cd16a55d5b3a9
The creation of a thread succeeds even if the requested scheduling
parameters can not be set. This is not POSIX compliant, and even
worse, it leads to a wrong behavior. Let pthread_create() fail in this
case.
Change-Id: Ice66e2a720975c6bde9fe86c2cf8f649533a169c
Signed-off-by: Christian Bejram <christian.bejram@stericsson.com>
Since e19d702b8e, dlsym and friends use recursive mutexes that
require the current thread id, which is not available before the libc
constructor. This prevents us from using dlsym() in .preinit_array.
This change moves TLS initialization from libc constructor to the earliest
possible point - immediately after linker itself is relocated. As a result,
pthread_internal_t for the initial thread is available from the start.
As a bonus, values stored in TLS in .preinit_array are not lost when libc is
initialized.
Change-Id: Iee5a710ee000173bff63e924adeb4a4c600c1e2d
First commit:
Revert "Revert "am be741d47: am 2f460fbe: am 73b5cad9: Merge "bionic: Fix wrong kernel_id in pthread descriptor after fork()"""
This reverts commit 06823da2f0.
Second commit:
bionic: fix atfork hanlder_mutex deadlock
This cherry-picks commit 34e89c232d
After applying the kernel_id fix, the system refused to boot up and we
got following crash log:
I/DEBUG ( 113): pid: 618, tid: 618 >>> org.simalliance.openmobileapi.service:remote <<<
I/DEBUG ( 113): signal 16 (SIGSTKFLT), code -6 (?), fault addr --------
I/DEBUG ( 113): eax fffffe00 ebx b77de994 ecx 00000080 edx 00724002
I/DEBUG ( 113): esi 00000000 edi 00004000
I/DEBUG ( 113): xcs 00000073 xds 0000007b xes 0000007b xfs 00000000 xss 0000007b
I/DEBUG ( 113): eip b7761351 ebp bfdf3de8 esp bfdf3dc4 flags 00000202
I/DEBUG ( 113): #00 eip: 00015351 /system/lib/libc.so
I/DEBUG ( 113): #01 eip: 0000d13c /system/lib/libc.so (pthread_mutex_lock)
I/DEBUG ( 113): #02 eip: 00077b48 /system/lib/libc.so (__bionic_atfork_run_prepare)
I/DEBUG ( 113): #03 eip: 00052cdb /system/lib/libc.so (fork)
I/DEBUG ( 113): #04 eip: 0009ae91 /system/lib/libdvm.so (_Z18dvmOptimizeDexFileillPKcjjb)
I/DEBUG ( 113): #05 eip: 000819d6 /system/lib/libdvm.so (_Z14dvmJarFileOpenPKcS0_PP7JarFileb)
I/DEBUG ( 113): #06 eip: 000b175e /system/lib/libdvm.so (_ZL40Dalvik_dalvik_system_DexFile_openDexFilePKjP6JValue)
I/DEBUG ( 113): #07 eip: 0011fb94 /system/lib/libdvm.so
Root cause:
The atfork uses the mutex handler_mutex to protect the atfork_head. The
parent will call __bionic_atfork_run_prepare() to lock the handler_mutex,
and need both the parent and child to unlock their own copy of handler_mutex
after fork. At that time, the owner of hanlder_mutex is set as the parent.
If we apply the kernel_id fix, then the child's kernel_id will be set as
child's tid.
The handler_mutex is a recursive lock, and pthread_mutex_unlock(&hander_mutex)
will fail because the mutex owner is the parent, while the current tid
(__get_thread()->kernel_id) is child, not matched with the mutex owner.
At that time, the handler_mutex is left in lock state.If the child wants to
fork other process after than, then it will try to lock handler_mutex, and
then be deadlocked.
Fix:
Since the child has its own copy of vm space from the the parent, the
child space's handler_mutex should be reset to the initialized state.
Change-Id: I3907dd9a153418fb78862f2aa6d0302c375d9e27
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Chenyang Du <chenyang.du@intel.com>
Signed-off-by: Bruce Beare <bruce.j.beare@intel.com>
Change-Id: Ic8072f366a877443a60fe215f3c00b3df5a259c8
After forking, the kernel_id field in the phtread_internal_t returned by pthread_self()
is incorrect --- it's the tid from the parent, not the new tid of the
child.
The root cause is that: currently the kernel_id is set by
_init_thread(), which is called in 2 cases:
(1) called by __libc_init_common(). That happens when the execv( ) is
called after fork( ). But when the zygote tries to fork the android
application, the child application doesn't call execv( ), instread, it
tries to call the Java main method directly.
(2) called by pthread_create(). That happens when a new thread is
created.
For the lead thread which is the thread created by fork(), it should
call execv() but it doesn't, as described in (1) above. So its kernel_id
will inherit the parent's kernel_id.
Fixed it in this patch.
Change-Id: I63513e82af40ec5fe51fbb69456b1843e4bc0fc7
Signed-off-by: Chenyang Du <chenyang.du@intel.com>
Signed-off-by: Jack Ren <jack.ren@intel.com>
Signed-off-by: Bruce Beare <bruce.j.beare@intel.com>
This optimization improves the performance of recursive locks
drastically. When running the thread_stress program on a Xoom,
the total time to perform all operations goes from 1500 ms to
500 ms on average after this change is pushed to the device.
Change-Id: I5d9407a9191bdefdaccff7e7edefc096ebba9a9d
This fixes a bug that was introduced in the latest pthread optimization.
It happens when a recursive lock is contented by several threads. The main
issue was that the atomic counter increment in _recursive_increment() could
be annihilated by a non-conditional write in pthread_mutex_lock() used to
update the value's lower bits to indicate contention.
This patch re-introduces the use of the global recursive lock in
_recursive_increment(). This will hit performance, but a future patch
will be provided to remove it from the source code.
Change-Id: Ie22069d376cebf2e7d613ba00b6871567f333544
this works by building a directed graph of acquired
pthread mutexes and making sure there are no loops in
that graph.
this feature is enabled with:
setprop debug.libc.pthread 1
when a potential deadlock is detected, a large warning is
output to the log with appropriate back traces.
currently disabled at compile-time. set PTHREAD_DEBUG_ENABLED=1
to enable.
Change-Id: I916eed2319599e8aaf8f229d3f18a8ddbec3aa8a
This patch provides several small optimizations to the
implementation of mutex locking and unlocking. Note that
a following patch will get rid of the global recursion
lock, and provide a few more aggressive changes, I
though it'd be simpler to split this change in two parts.
+ New behaviour: pthread_mutex_lock et al now detect
recursive mutex overflows and will return EAGAIN in
this case, as suggested by POSIX. Before, the counter
would just wrap to 0.
- Remove un-necessary reloads of the mutex value from memory
by storing it in a local variable (mvalue)
- Remove un-necessary reload of the mutex value by passing
the 'shared' local variable to _normal_lock / _normal_unlock
- Remove un-necessary reload of the mutex value by using a
new macro (MUTEX_VALUE_OWNER()) to compare the thread id
for recursive/errorcheck mutexes
- Use a common inlined function to increment the counter
of a recursive mutex. Also do not use the global
recursion lock in this case to speed it up.
Change-Id: I106934ec3a8718f8f852ef547f3f0e9d9435c816
This patch changes the implementation of pthread_once()
to avoid the use of a single global recursive mutex. This
should also slightly speed up the non-common case where
we have to call the init function, or wait for another
thread to finish the call.
Change-Id: I8a93f4386c56fb89b5d0eb716689c2ce43bdcad9
Pass kernel space sigset_t size to __rt_sigprocmask to workaround
the miss-match of NSIG/sigset_t definition between kernel and bionic.
Note: Patch originally from Google...
Change-Id: I4840fdc56d0b90d7ce2334250f04a84caffcba2a
Signed-off-by: Chenyang Du <chenyang.du@intel.com>
Signed-off-by: Bruce Beare <bruce.j.beare@intel.com>
(1) in pthread_create:
If the one signal is received before esp is subtracted by 16 and
__thread_entry( ) is called, the stack will be cleared by kernel
when it tries to contruct the signal stack frame. That will cause
that __thread_entry will get a wrong tls pointer from the stack
which leads to the segment fault when trying to access tls content.
(2) in pthread_exit
After pthread_exit called system call unmap(), its stack will be
freed. If one signal is received at that time, there is no stack
available for it.
Fixed by subtracting the child's esp by 16 before the clone system
call and by blocking signal handling before pthread_exit is started.
Author: Jack Ren <jack.ren@intel.com>
Signed-off-by: Bruce Beare <bruce.j.beare@intel.com>
Use tgkill instead of tkill to implement pthread_kill.
This is safer in the event that the thread has already terminated
and its id has been reused by a different process.
Change-Id: Ied715e11d7eadeceead79f33db5e2b5722954ac9
Allow the kernel to choose a memory location to put the
thread stack, rather than hard coding 0x10000000
Change-Id: Ib1f37cf0273d4977e8d274fbdab9431ec1b7cb4f
We're going to modify the __atomic_xxx implementation to provide
full memory barriers, to avoid problems for NDK machine code that
link to these functions.
First step is to remove their usage from our platform code.
We now use inlined versions of the same functions for a slight
performance boost.
+ remove obsolete atomics_x86.c (was never compiled)
NOTE: This improvement was benchmarked on various devices.
Comparing a pthread mutex lock + atomic increment + unlock
we get:
- ARMv7 emulator, running on a 2.4 GHz Xeon:
before: 396 ns after: 288 ns
- x86 emulator in KVM mode on same machine:
before: 27 ns after: 27 ns
- Google Nexus S, in ARMv7 mode (single-core):
before: 82 ns after: 76 ns
- Motorola Xoom, in ARMv7 mode (multi-core):
before: 121 ns after: 120 ns
The code has also been rebuilt in ARMv5TE mode for correctness.
Change-Id: Ic1dc72b173d59b2e7af901dd70d6a72fb2f64b17
The old code didn't work because the kernel expects a 64-bit sigset_t
while the one provided by our ABI is only 32-bit. This is originally
due to the fact that the kernel headers themselves define sigset_t
as a 32-bit type when __KERNEL__ is not defined (apparently to cater
to libc5 or some similarly old C library).
We can't modify the size of sigset_t without breaking the NDK ABI,
so instead perform runtime translation during the call.
Change-Id: Ibfdc3cbceaff864af7a05ca193aa050047b4773f
After this change, SIGRTMAX will be set to 64 (instead of 32 currently).
Note that this doesn't change the fact that our sigset_t is still defined
as a 32-bit unsigned integer, so most functions that deal with this type
won't support real-time signals though.
Change-Id: Ie1e2f97d646f1664f05a0ac9cac4a43278c3cfa8
The implementation was using a double-checked locking approach that
could break on SMP.
In addition to the barriers I also switched to a volatile pointer. I
don't think this will matter unless gcc can conclude that _normal_lock
can't affect *once_control, but I figured it was better to be safe.
(It seems to have no impact whatsoever on the generated code.)
Bug 3022795.
Change-Id: Ib91da25d57ff5bee4288526e39d457153ef6aacd
This adds an explicit memory barrier to condition variable signaling.
It's a little murky as to whether it's strictly required, but it seems
like a wise thing to do.
Change-Id: Id0faa542d61e4b8ffa775e4adf68e4d7471f4fb7
Also add missing declarations to misc. functions.
Fix clearerr() implementation (previous was broken).
Handle feature test macros like _POSIX_C_SOURCE properly.
Change-Id: Icdc973a6b9d550a166fc2545f727ea837fe800c4
We simply copy the stuff we need from cutils headers.
A future patch will change cutils to include the private <bionic_atomic_inline.h>
Change-Id: Ib6fd9a03bc9e337ce867bd606dc94c2b4438480a
Update ARM atomic ops to use LDREX/STREX. Stripped out #if 0 chunk.
Insert explicit memory barriers in pthread and semaphore code.
For bug 2721865.
Change-Id: I0f153b797753a655702d8be41679273d1d5d6ae7
... so that each cloned process at the kernel level can be named
independently. Tools like 'top' can display the CPU/memory statistics
for each process's thread if "Show Threads" mode is on.
With this function in place, we can convert dalvik/Thread.c setThreadName()
function over this function. This feature ought to be provided by the
underlying C library and not coded directly in Dalvik.
Change-Id: Ifa997665dbaa114e0b126f8c667708be9a4137fd
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Now that the system properly uses shared condvars when needed, we
can enable the use of private futexes for them too.
Change-Id: Icf8351fc0a2309f764cba45c65bc3af047720cdf
This does not change the implementation of conditional variables
since we're waiting for other system components to properly use
pthread_condattr_init/setpshared before that.
Also remove an obsolete x86 source file.
Change-Id: Ia3e3fbac35b87a534fb04d4381c3c66b975bc8f7
Note that this does not change the implementation of conditional variables
which still use shared futexes, independent on the flags being selected.
This will be fixed in a later patch, once our system is modified to use
pthread_condattr_setpshared(attr, PTHREAD_PROCESS_SHARED) properly.
Change-Id: I935de50964cd41f97a13dbfd6626d3407b0406c3
Private futexes are a recent kernel addition: faster futexes that cannot be
shared between processes. This patch uses them by default, unless the PROCESS_SHARED
attribute flag is used when creating a mutex and/or conditional variable.
Also introduces pthread_condattr_init/destroy/setpshared/getpshared.
Change-Id: I3a0e2116f467072b046524cb5babc00e41057a53