Reorganize static TLS memory for ELF TLS
For ELF TLS "local-exec" accesses, the static linker assumes that an
executable's TLS segment is located at a statically-known offset from the
thread pointer (i.e. "variant 1" for ARM and "variant 2" for x86).
Because these layouts are incompatible, Bionic generally needs to allocate
its TLS slots differently between different architectures.
To allow per-architecture TLS slots:
- Replace the TLS_SLOT_xxx enumerators with macros. New ARM slots are
generally negative, while new x86 slots are generally positive.
- Define a bionic_tcb struct that provides two things:
- a void* raw_slots_storage[BIONIC_TLS_SLOTS] field
- an inline accessor function: void*& tls_slot(size_t tpindex);
For ELF TLS, it's necessary to allocate a temporary TCB (i.e. TLS slots),
because the runtime linker doesn't know how large the static TLS area is
until after it has loaded all of the initial solibs.
To accommodate Golang, it's necessary to allocate the pthread keys at a
fixed, small, positive offset from the thread pointer.
This CL moves the pthread keys into bionic_tls, then allocates a single
mapping per thread that looks like so:
- stack guard
- stack [omitted for main thread and with pthread_attr_setstack]
- static TLS:
- bionic_tcb [exec TLS will either precede or succeed the TCB]
- bionic_tls [prefixed by the pthread keys]
- [solib TLS segments will be placed here]
- guard page
As before, if the new mapping includes a stack, the pthread_internal_t
is allocated on it.
At startup, Bionic allocates a temporary bionic_tcb object on the stack,
then allocates a temporary bionic_tls object using mmap. This mmap is
delayed because the linker can't currently call async_safe_fatal() before
relocating itself.
Later, Bionic allocates a stack-less thread mapping for the main thread,
and copies slots from the temporary TCB to the new TCB.
(See *::copy_from_bootstrap methods.)
Bug: http://b/78026329
Test: bionic unit tests
Test: verify that a Golang app still works
Test: verify that a Golang app crashes if bionic_{tls,tcb} are swapped
Merged-In: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
Change-Id: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
(cherry picked from commit 1e660b70da625fcbf1e43dfae09b7b4817fa1660)
2019-01-03 11:51:30 +01:00
|
|
|
/*
|
|
|
|
* Copyright (C) 2019 The Android Open Source Project
|
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* * Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* * Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in
|
|
|
|
* the documentation and/or other materials provided with the
|
|
|
|
* distribution.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
|
|
|
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
|
|
|
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
|
|
|
|
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
|
|
|
|
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
|
|
|
|
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
|
|
|
|
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
|
|
|
|
* OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
|
|
|
|
* AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
|
|
|
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
|
|
|
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "private/bionic_elf_tls.h"
|
|
|
|
|
2019-01-07 03:24:10 +01:00
|
|
|
#include <async_safe/log.h>
|
2019-01-15 22:45:27 +01:00
|
|
|
#include <string.h>
|
Reorganize static TLS memory for ELF TLS
For ELF TLS "local-exec" accesses, the static linker assumes that an
executable's TLS segment is located at a statically-known offset from the
thread pointer (i.e. "variant 1" for ARM and "variant 2" for x86).
Because these layouts are incompatible, Bionic generally needs to allocate
its TLS slots differently between different architectures.
To allow per-architecture TLS slots:
- Replace the TLS_SLOT_xxx enumerators with macros. New ARM slots are
generally negative, while new x86 slots are generally positive.
- Define a bionic_tcb struct that provides two things:
- a void* raw_slots_storage[BIONIC_TLS_SLOTS] field
- an inline accessor function: void*& tls_slot(size_t tpindex);
For ELF TLS, it's necessary to allocate a temporary TCB (i.e. TLS slots),
because the runtime linker doesn't know how large the static TLS area is
until after it has loaded all of the initial solibs.
To accommodate Golang, it's necessary to allocate the pthread keys at a
fixed, small, positive offset from the thread pointer.
This CL moves the pthread keys into bionic_tls, then allocates a single
mapping per thread that looks like so:
- stack guard
- stack [omitted for main thread and with pthread_attr_setstack]
- static TLS:
- bionic_tcb [exec TLS will either precede or succeed the TCB]
- bionic_tls [prefixed by the pthread keys]
- [solib TLS segments will be placed here]
- guard page
As before, if the new mapping includes a stack, the pthread_internal_t
is allocated on it.
At startup, Bionic allocates a temporary bionic_tcb object on the stack,
then allocates a temporary bionic_tls object using mmap. This mmap is
delayed because the linker can't currently call async_safe_fatal() before
relocating itself.
Later, Bionic allocates a stack-less thread mapping for the main thread,
and copies slots from the temporary TCB to the new TCB.
(See *::copy_from_bootstrap methods.)
Bug: http://b/78026329
Test: bionic unit tests
Test: verify that a Golang app still works
Test: verify that a Golang app crashes if bionic_{tls,tcb} are swapped
Merged-In: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
Change-Id: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
(cherry picked from commit 1e660b70da625fcbf1e43dfae09b7b4817fa1660)
2019-01-03 11:51:30 +01:00
|
|
|
#include <sys/param.h>
|
2019-01-07 03:24:10 +01:00
|
|
|
#include <unistd.h>
|
Reorganize static TLS memory for ELF TLS
For ELF TLS "local-exec" accesses, the static linker assumes that an
executable's TLS segment is located at a statically-known offset from the
thread pointer (i.e. "variant 1" for ARM and "variant 2" for x86).
Because these layouts are incompatible, Bionic generally needs to allocate
its TLS slots differently between different architectures.
To allow per-architecture TLS slots:
- Replace the TLS_SLOT_xxx enumerators with macros. New ARM slots are
generally negative, while new x86 slots are generally positive.
- Define a bionic_tcb struct that provides two things:
- a void* raw_slots_storage[BIONIC_TLS_SLOTS] field
- an inline accessor function: void*& tls_slot(size_t tpindex);
For ELF TLS, it's necessary to allocate a temporary TCB (i.e. TLS slots),
because the runtime linker doesn't know how large the static TLS area is
until after it has loaded all of the initial solibs.
To accommodate Golang, it's necessary to allocate the pthread keys at a
fixed, small, positive offset from the thread pointer.
This CL moves the pthread keys into bionic_tls, then allocates a single
mapping per thread that looks like so:
- stack guard
- stack [omitted for main thread and with pthread_attr_setstack]
- static TLS:
- bionic_tcb [exec TLS will either precede or succeed the TCB]
- bionic_tls [prefixed by the pthread keys]
- [solib TLS segments will be placed here]
- guard page
As before, if the new mapping includes a stack, the pthread_internal_t
is allocated on it.
At startup, Bionic allocates a temporary bionic_tcb object on the stack,
then allocates a temporary bionic_tls object using mmap. This mmap is
delayed because the linker can't currently call async_safe_fatal() before
relocating itself.
Later, Bionic allocates a stack-less thread mapping for the main thread,
and copies slots from the temporary TCB to the new TCB.
(See *::copy_from_bootstrap methods.)
Bug: http://b/78026329
Test: bionic unit tests
Test: verify that a Golang app still works
Test: verify that a Golang app crashes if bionic_{tls,tcb} are swapped
Merged-In: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
Change-Id: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
(cherry picked from commit 1e660b70da625fcbf1e43dfae09b7b4817fa1660)
2019-01-03 11:51:30 +01:00
|
|
|
|
2019-01-15 22:45:27 +01:00
|
|
|
#include "private/ScopedRWLock.h"
|
2019-01-18 10:00:59 +01:00
|
|
|
#include "private/ScopedSignalBlocker.h"
|
2019-01-15 22:45:27 +01:00
|
|
|
#include "private/bionic_globals.h"
|
2019-12-20 01:35:51 +01:00
|
|
|
#include "platform/bionic/macros.h"
|
Reorganize static TLS memory for ELF TLS
For ELF TLS "local-exec" accesses, the static linker assumes that an
executable's TLS segment is located at a statically-known offset from the
thread pointer (i.e. "variant 1" for ARM and "variant 2" for x86).
Because these layouts are incompatible, Bionic generally needs to allocate
its TLS slots differently between different architectures.
To allow per-architecture TLS slots:
- Replace the TLS_SLOT_xxx enumerators with macros. New ARM slots are
generally negative, while new x86 slots are generally positive.
- Define a bionic_tcb struct that provides two things:
- a void* raw_slots_storage[BIONIC_TLS_SLOTS] field
- an inline accessor function: void*& tls_slot(size_t tpindex);
For ELF TLS, it's necessary to allocate a temporary TCB (i.e. TLS slots),
because the runtime linker doesn't know how large the static TLS area is
until after it has loaded all of the initial solibs.
To accommodate Golang, it's necessary to allocate the pthread keys at a
fixed, small, positive offset from the thread pointer.
This CL moves the pthread keys into bionic_tls, then allocates a single
mapping per thread that looks like so:
- stack guard
- stack [omitted for main thread and with pthread_attr_setstack]
- static TLS:
- bionic_tcb [exec TLS will either precede or succeed the TCB]
- bionic_tls [prefixed by the pthread keys]
- [solib TLS segments will be placed here]
- guard page
As before, if the new mapping includes a stack, the pthread_internal_t
is allocated on it.
At startup, Bionic allocates a temporary bionic_tcb object on the stack,
then allocates a temporary bionic_tls object using mmap. This mmap is
delayed because the linker can't currently call async_safe_fatal() before
relocating itself.
Later, Bionic allocates a stack-less thread mapping for the main thread,
and copies slots from the temporary TCB to the new TCB.
(See *::copy_from_bootstrap methods.)
Bug: http://b/78026329
Test: bionic unit tests
Test: verify that a Golang app still works
Test: verify that a Golang app crashes if bionic_{tls,tcb} are swapped
Merged-In: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
Change-Id: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
(cherry picked from commit 1e660b70da625fcbf1e43dfae09b7b4817fa1660)
2019-01-03 11:51:30 +01:00
|
|
|
#include "private/bionic_tls.h"
|
2019-01-18 10:00:59 +01:00
|
|
|
#include "pthread_internal.h"
|
|
|
|
|
|
|
|
// Every call to __tls_get_addr needs to check the generation counter, so
|
|
|
|
// accesses to the counter need to be as fast as possible. Keep a copy of it in
|
|
|
|
// a hidden variable, which can be accessed without using the GOT. The linker
|
|
|
|
// will update this variable when it updates its counter.
|
|
|
|
//
|
|
|
|
// To allow the linker to update this variable, libc.so's constructor passes its
|
|
|
|
// address to the linker. To accommodate a possible __tls_get_addr call before
|
|
|
|
// libc.so's constructor, this local copy is initialized to SIZE_MAX, forcing
|
|
|
|
// __tls_get_addr to initially use the slow path.
|
|
|
|
__LIBC_HIDDEN__ _Atomic(size_t) __libc_tls_generation_copy = SIZE_MAX;
|
Reorganize static TLS memory for ELF TLS
For ELF TLS "local-exec" accesses, the static linker assumes that an
executable's TLS segment is located at a statically-known offset from the
thread pointer (i.e. "variant 1" for ARM and "variant 2" for x86).
Because these layouts are incompatible, Bionic generally needs to allocate
its TLS slots differently between different architectures.
To allow per-architecture TLS slots:
- Replace the TLS_SLOT_xxx enumerators with macros. New ARM slots are
generally negative, while new x86 slots are generally positive.
- Define a bionic_tcb struct that provides two things:
- a void* raw_slots_storage[BIONIC_TLS_SLOTS] field
- an inline accessor function: void*& tls_slot(size_t tpindex);
For ELF TLS, it's necessary to allocate a temporary TCB (i.e. TLS slots),
because the runtime linker doesn't know how large the static TLS area is
until after it has loaded all of the initial solibs.
To accommodate Golang, it's necessary to allocate the pthread keys at a
fixed, small, positive offset from the thread pointer.
This CL moves the pthread keys into bionic_tls, then allocates a single
mapping per thread that looks like so:
- stack guard
- stack [omitted for main thread and with pthread_attr_setstack]
- static TLS:
- bionic_tcb [exec TLS will either precede or succeed the TCB]
- bionic_tls [prefixed by the pthread keys]
- [solib TLS segments will be placed here]
- guard page
As before, if the new mapping includes a stack, the pthread_internal_t
is allocated on it.
At startup, Bionic allocates a temporary bionic_tcb object on the stack,
then allocates a temporary bionic_tls object using mmap. This mmap is
delayed because the linker can't currently call async_safe_fatal() before
relocating itself.
Later, Bionic allocates a stack-less thread mapping for the main thread,
and copies slots from the temporary TCB to the new TCB.
(See *::copy_from_bootstrap methods.)
Bug: http://b/78026329
Test: bionic unit tests
Test: verify that a Golang app still works
Test: verify that a Golang app crashes if bionic_{tls,tcb} are swapped
Merged-In: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
Change-Id: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
(cherry picked from commit 1e660b70da625fcbf1e43dfae09b7b4817fa1660)
2019-01-03 11:51:30 +01:00
|
|
|
|
2019-01-07 03:24:10 +01:00
|
|
|
// Search for a TLS segment in the given phdr table. Returns true if it has a
|
|
|
|
// TLS segment and false otherwise.
|
|
|
|
bool __bionic_get_tls_segment(const ElfW(Phdr)* phdr_table, size_t phdr_count,
|
2019-01-17 08:13:38 +01:00
|
|
|
ElfW(Addr) load_bias, TlsSegment* out) {
|
2019-01-07 03:24:10 +01:00
|
|
|
for (size_t i = 0; i < phdr_count; ++i) {
|
|
|
|
const ElfW(Phdr)& phdr = phdr_table[i];
|
|
|
|
if (phdr.p_type == PT_TLS) {
|
|
|
|
*out = TlsSegment {
|
|
|
|
phdr.p_memsz,
|
2019-01-17 08:13:38 +01:00
|
|
|
phdr.p_align,
|
2019-01-07 03:24:10 +01:00
|
|
|
reinterpret_cast<void*>(load_bias + phdr.p_vaddr),
|
|
|
|
phdr.p_filesz,
|
|
|
|
};
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2019-01-17 08:13:38 +01:00
|
|
|
// Return true if the alignment of a TLS segment is a valid power-of-two. Also
|
|
|
|
// cap the alignment if it's too high.
|
|
|
|
bool __bionic_check_tls_alignment(size_t* alignment) {
|
|
|
|
// N.B. The size does not need to be a multiple of the alignment. With
|
|
|
|
// ld.bfd (or after using binutils' strip), the TLS segment's size isn't
|
|
|
|
// rounded up.
|
|
|
|
if (*alignment == 0 || !powerof2(*alignment)) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
// Bionic only respects TLS alignment up to one page.
|
|
|
|
*alignment = MIN(*alignment, PAGE_SIZE);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2019-01-15 09:11:37 +01:00
|
|
|
size_t StaticTlsLayout::offset_thread_pointer() const {
|
|
|
|
return offset_bionic_tcb_ + (-MIN_TLS_SLOT * sizeof(void*));
|
|
|
|
}
|
|
|
|
|
2019-01-15 06:52:14 +01:00
|
|
|
// Reserves space for the Bionic TCB and the executable's TLS segment. Returns
|
|
|
|
// the offset of the executable's TLS segment.
|
|
|
|
size_t StaticTlsLayout::reserve_exe_segment_and_tcb(const TlsSegment* exe_segment,
|
|
|
|
const char* progname __attribute__((unused))) {
|
|
|
|
// Special case: if the executable has no TLS segment, then just allocate a
|
|
|
|
// TCB and skip the minimum alignment check on ARM.
|
|
|
|
if (exe_segment == nullptr) {
|
|
|
|
offset_bionic_tcb_ = reserve_type<bionic_tcb>();
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
#if defined(__arm__) || defined(__aarch64__)
|
|
|
|
|
|
|
|
// First reserve enough space for the TCB before the executable segment.
|
|
|
|
reserve(sizeof(bionic_tcb), 1);
|
|
|
|
|
|
|
|
// Then reserve the segment itself.
|
|
|
|
const size_t result = reserve(exe_segment->size, exe_segment->alignment);
|
|
|
|
|
|
|
|
// The variant 1 ABI that ARM linkers follow specifies a 2-word TCB between
|
|
|
|
// the thread pointer and the start of the executable's TLS segment, but both
|
|
|
|
// the thread pointer and the TLS segment are aligned appropriately for the
|
|
|
|
// TLS segment. Calculate the distance between the thread pointer and the
|
|
|
|
// EXE's segment.
|
|
|
|
const size_t exe_tpoff = __BIONIC_ALIGN(sizeof(void*) * 2, exe_segment->alignment);
|
|
|
|
|
|
|
|
const size_t min_bionic_alignment = BIONIC_ROUND_UP_POWER_OF_2(MAX_TLS_SLOT) * sizeof(void*);
|
|
|
|
if (exe_tpoff < min_bionic_alignment) {
|
|
|
|
async_safe_fatal("error: \"%s\": executable's TLS segment is underaligned: "
|
|
|
|
"alignment is %zu, needs to be at least %zu for %s Bionic",
|
|
|
|
progname, exe_segment->alignment, min_bionic_alignment,
|
|
|
|
(sizeof(void*) == 4 ? "ARM" : "ARM64"));
|
|
|
|
}
|
|
|
|
|
|
|
|
offset_bionic_tcb_ = result - exe_tpoff - (-MIN_TLS_SLOT * sizeof(void*));
|
|
|
|
return result;
|
|
|
|
|
|
|
|
#elif defined(__i386__) || defined(__x86_64__)
|
|
|
|
|
|
|
|
// x86 uses variant 2 TLS layout. The executable's segment is located just
|
|
|
|
// before the TCB.
|
|
|
|
static_assert(MIN_TLS_SLOT == 0, "First slot of bionic_tcb must be slot #0 on x86");
|
|
|
|
const size_t exe_size = round_up_with_overflow_check(exe_segment->size, exe_segment->alignment);
|
|
|
|
reserve(exe_size, 1);
|
|
|
|
const size_t max_align = MAX(alignof(bionic_tcb), exe_segment->alignment);
|
|
|
|
offset_bionic_tcb_ = reserve(sizeof(bionic_tcb), max_align);
|
|
|
|
return offset_bionic_tcb_ - exe_size;
|
|
|
|
|
|
|
|
#else
|
|
|
|
#error "Unrecognized architecture"
|
|
|
|
#endif
|
Reorganize static TLS memory for ELF TLS
For ELF TLS "local-exec" accesses, the static linker assumes that an
executable's TLS segment is located at a statically-known offset from the
thread pointer (i.e. "variant 1" for ARM and "variant 2" for x86).
Because these layouts are incompatible, Bionic generally needs to allocate
its TLS slots differently between different architectures.
To allow per-architecture TLS slots:
- Replace the TLS_SLOT_xxx enumerators with macros. New ARM slots are
generally negative, while new x86 slots are generally positive.
- Define a bionic_tcb struct that provides two things:
- a void* raw_slots_storage[BIONIC_TLS_SLOTS] field
- an inline accessor function: void*& tls_slot(size_t tpindex);
For ELF TLS, it's necessary to allocate a temporary TCB (i.e. TLS slots),
because the runtime linker doesn't know how large the static TLS area is
until after it has loaded all of the initial solibs.
To accommodate Golang, it's necessary to allocate the pthread keys at a
fixed, small, positive offset from the thread pointer.
This CL moves the pthread keys into bionic_tls, then allocates a single
mapping per thread that looks like so:
- stack guard
- stack [omitted for main thread and with pthread_attr_setstack]
- static TLS:
- bionic_tcb [exec TLS will either precede or succeed the TCB]
- bionic_tls [prefixed by the pthread keys]
- [solib TLS segments will be placed here]
- guard page
As before, if the new mapping includes a stack, the pthread_internal_t
is allocated on it.
At startup, Bionic allocates a temporary bionic_tcb object on the stack,
then allocates a temporary bionic_tls object using mmap. This mmap is
delayed because the linker can't currently call async_safe_fatal() before
relocating itself.
Later, Bionic allocates a stack-less thread mapping for the main thread,
and copies slots from the temporary TCB to the new TCB.
(See *::copy_from_bootstrap methods.)
Bug: http://b/78026329
Test: bionic unit tests
Test: verify that a Golang app still works
Test: verify that a Golang app crashes if bionic_{tls,tcb} are swapped
Merged-In: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
Change-Id: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
(cherry picked from commit 1e660b70da625fcbf1e43dfae09b7b4817fa1660)
2019-01-03 11:51:30 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
void StaticTlsLayout::reserve_bionic_tls() {
|
|
|
|
offset_bionic_tls_ = reserve_type<bionic_tls>();
|
|
|
|
}
|
|
|
|
|
|
|
|
void StaticTlsLayout::finish_layout() {
|
|
|
|
// Round the offset up to the alignment.
|
|
|
|
offset_ = round_up_with_overflow_check(offset_, alignment_);
|
2019-01-15 06:52:14 +01:00
|
|
|
|
|
|
|
if (overflowed_) {
|
|
|
|
async_safe_fatal("error: TLS segments in static TLS overflowed");
|
|
|
|
}
|
Reorganize static TLS memory for ELF TLS
For ELF TLS "local-exec" accesses, the static linker assumes that an
executable's TLS segment is located at a statically-known offset from the
thread pointer (i.e. "variant 1" for ARM and "variant 2" for x86).
Because these layouts are incompatible, Bionic generally needs to allocate
its TLS slots differently between different architectures.
To allow per-architecture TLS slots:
- Replace the TLS_SLOT_xxx enumerators with macros. New ARM slots are
generally negative, while new x86 slots are generally positive.
- Define a bionic_tcb struct that provides two things:
- a void* raw_slots_storage[BIONIC_TLS_SLOTS] field
- an inline accessor function: void*& tls_slot(size_t tpindex);
For ELF TLS, it's necessary to allocate a temporary TCB (i.e. TLS slots),
because the runtime linker doesn't know how large the static TLS area is
until after it has loaded all of the initial solibs.
To accommodate Golang, it's necessary to allocate the pthread keys at a
fixed, small, positive offset from the thread pointer.
This CL moves the pthread keys into bionic_tls, then allocates a single
mapping per thread that looks like so:
- stack guard
- stack [omitted for main thread and with pthread_attr_setstack]
- static TLS:
- bionic_tcb [exec TLS will either precede or succeed the TCB]
- bionic_tls [prefixed by the pthread keys]
- [solib TLS segments will be placed here]
- guard page
As before, if the new mapping includes a stack, the pthread_internal_t
is allocated on it.
At startup, Bionic allocates a temporary bionic_tcb object on the stack,
then allocates a temporary bionic_tls object using mmap. This mmap is
delayed because the linker can't currently call async_safe_fatal() before
relocating itself.
Later, Bionic allocates a stack-less thread mapping for the main thread,
and copies slots from the temporary TCB to the new TCB.
(See *::copy_from_bootstrap methods.)
Bug: http://b/78026329
Test: bionic unit tests
Test: verify that a Golang app still works
Test: verify that a Golang app crashes if bionic_{tls,tcb} are swapped
Merged-In: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
Change-Id: I6543063752f4ec8ef6dc9c7f2a06ce2a18fc5af3
(cherry picked from commit 1e660b70da625fcbf1e43dfae09b7b4817fa1660)
2019-01-03 11:51:30 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
// The size is not required to be a multiple of the alignment. The alignment
|
|
|
|
// must be a positive power-of-two.
|
|
|
|
size_t StaticTlsLayout::reserve(size_t size, size_t alignment) {
|
|
|
|
offset_ = round_up_with_overflow_check(offset_, alignment);
|
|
|
|
const size_t result = offset_;
|
|
|
|
if (__builtin_add_overflow(offset_, size, &offset_)) overflowed_ = true;
|
|
|
|
alignment_ = MAX(alignment_, alignment);
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t StaticTlsLayout::round_up_with_overflow_check(size_t value, size_t alignment) {
|
|
|
|
const size_t old_value = value;
|
|
|
|
value = __BIONIC_ALIGN(value, alignment);
|
|
|
|
if (value < old_value) overflowed_ = true;
|
|
|
|
return value;
|
|
|
|
}
|
2019-01-15 22:45:27 +01:00
|
|
|
|
|
|
|
// Copy each TLS module's initialization image into a newly-allocated block of
|
|
|
|
// static TLS memory. To reduce dirty pages, this function only writes to pages
|
|
|
|
// within the static TLS that need initialization. The memory should already be
|
|
|
|
// zero-initialized on entry.
|
|
|
|
void __init_static_tls(void* static_tls) {
|
|
|
|
// The part of the table we care about (i.e. static TLS modules) never changes
|
|
|
|
// after startup, but we still need the mutex because the table could grow,
|
|
|
|
// moving the initial part. If this locking is too slow, we can duplicate the
|
|
|
|
// static part of the table.
|
|
|
|
TlsModules& modules = __libc_shared_globals()->tls_modules;
|
2019-01-18 10:00:59 +01:00
|
|
|
ScopedSignalBlocker ssb;
|
2019-01-15 22:45:27 +01:00
|
|
|
ScopedReadLock locker(&modules.rwlock);
|
|
|
|
|
|
|
|
for (size_t i = 0; i < modules.module_count; ++i) {
|
|
|
|
TlsModule& module = modules.module_table[i];
|
|
|
|
if (module.static_offset == SIZE_MAX) {
|
|
|
|
// All of the static modules come before all of the dynamic modules, so
|
|
|
|
// once we see the first dynamic module, we're done.
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (module.segment.init_size == 0) {
|
|
|
|
// Skip the memcpy call for TLS segments with no initializer, which is
|
|
|
|
// common.
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
memcpy(static_cast<char*>(static_tls) + module.static_offset,
|
|
|
|
module.segment.init_ptr,
|
|
|
|
module.segment.init_size);
|
|
|
|
}
|
|
|
|
}
|
2019-01-18 10:00:59 +01:00
|
|
|
|
|
|
|
static inline size_t dtv_size_in_bytes(size_t module_count) {
|
|
|
|
return sizeof(TlsDtv) + module_count * sizeof(void*);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Calculates the number of module slots to allocate in a new DTV. For small
|
|
|
|
// objects (up to 1KiB), the TLS allocator allocates memory in power-of-2 sizes,
|
|
|
|
// so for better space usage, ensure that the DTV size (header + slots) is a
|
|
|
|
// power of 2.
|
|
|
|
//
|
|
|
|
// The lock on TlsModules must be held.
|
|
|
|
static size_t calculate_new_dtv_count() {
|
|
|
|
size_t loaded_cnt = __libc_shared_globals()->tls_modules.module_count;
|
|
|
|
size_t bytes = dtv_size_in_bytes(MAX(1, loaded_cnt));
|
|
|
|
if (!powerof2(bytes)) {
|
|
|
|
bytes = BIONIC_ROUND_UP_POWER_OF_2(bytes);
|
|
|
|
}
|
|
|
|
return (bytes - sizeof(TlsDtv)) / sizeof(void*);
|
|
|
|
}
|
|
|
|
|
|
|
|
// This function must be called with signals blocked and a write lock on
|
|
|
|
// TlsModules held.
|
|
|
|
static void update_tls_dtv(bionic_tcb* tcb) {
|
|
|
|
const TlsModules& modules = __libc_shared_globals()->tls_modules;
|
|
|
|
BionicAllocator& allocator = __libc_shared_globals()->tls_allocator;
|
|
|
|
|
|
|
|
// Use the generation counter from the shared globals instead of the local
|
|
|
|
// copy, which won't be initialized yet if __tls_get_addr is called before
|
|
|
|
// libc.so's constructor.
|
|
|
|
if (__get_tcb_dtv(tcb)->generation == atomic_load(&modules.generation)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
const size_t old_cnt = __get_tcb_dtv(tcb)->count;
|
|
|
|
|
|
|
|
// If the DTV isn't large enough, allocate a larger one. Because a signal
|
|
|
|
// handler could interrupt the fast path of __tls_get_addr, we don't free the
|
|
|
|
// old DTV. Instead, we add the old DTV to a list, then free all of a thread's
|
|
|
|
// DTVs at thread-exit. Each time the DTV is reallocated, its size at least
|
|
|
|
// doubles.
|
|
|
|
if (modules.module_count > old_cnt) {
|
|
|
|
size_t new_cnt = calculate_new_dtv_count();
|
|
|
|
TlsDtv* const old_dtv = __get_tcb_dtv(tcb);
|
|
|
|
TlsDtv* const new_dtv = static_cast<TlsDtv*>(allocator.alloc(dtv_size_in_bytes(new_cnt)));
|
|
|
|
memcpy(new_dtv, old_dtv, dtv_size_in_bytes(old_cnt));
|
|
|
|
new_dtv->count = new_cnt;
|
|
|
|
new_dtv->next = old_dtv;
|
|
|
|
__set_tcb_dtv(tcb, new_dtv);
|
|
|
|
}
|
|
|
|
|
|
|
|
TlsDtv* const dtv = __get_tcb_dtv(tcb);
|
|
|
|
|
|
|
|
const StaticTlsLayout& layout = __libc_shared_globals()->static_tls_layout;
|
|
|
|
char* static_tls = reinterpret_cast<char*>(tcb) - layout.offset_bionic_tcb();
|
|
|
|
|
|
|
|
// Initialize static TLS modules and free unloaded modules.
|
|
|
|
for (size_t i = 0; i < dtv->count; ++i) {
|
|
|
|
if (i < modules.module_count) {
|
|
|
|
const TlsModule& mod = modules.module_table[i];
|
|
|
|
if (mod.static_offset != SIZE_MAX) {
|
|
|
|
dtv->modules[i] = static_tls + mod.static_offset;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (mod.first_generation != kTlsGenerationNone &&
|
|
|
|
mod.first_generation <= dtv->generation) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
allocator.free(dtv->modules[i]);
|
|
|
|
dtv->modules[i] = nullptr;
|
|
|
|
}
|
|
|
|
|
|
|
|
dtv->generation = atomic_load(&modules.generation);
|
|
|
|
}
|
|
|
|
|
|
|
|
__attribute__((noinline)) static void* tls_get_addr_slow_path(const TlsIndex* ti) {
|
|
|
|
TlsModules& modules = __libc_shared_globals()->tls_modules;
|
|
|
|
bionic_tcb* tcb = __get_bionic_tcb();
|
|
|
|
|
|
|
|
// Block signals and lock TlsModules. We may need the allocator, so take
|
|
|
|
// a write lock.
|
|
|
|
ScopedSignalBlocker ssb;
|
|
|
|
ScopedWriteLock locker(&modules.rwlock);
|
|
|
|
|
|
|
|
update_tls_dtv(tcb);
|
|
|
|
|
|
|
|
TlsDtv* dtv = __get_tcb_dtv(tcb);
|
|
|
|
const size_t module_idx = __tls_module_id_to_idx(ti->module_id);
|
|
|
|
void* mod_ptr = dtv->modules[module_idx];
|
|
|
|
if (mod_ptr == nullptr) {
|
|
|
|
const TlsSegment& segment = modules.module_table[module_idx].segment;
|
|
|
|
mod_ptr = __libc_shared_globals()->tls_allocator.memalign(segment.alignment, segment.size);
|
|
|
|
if (segment.init_size > 0) {
|
|
|
|
memcpy(mod_ptr, segment.init_ptr, segment.init_size);
|
|
|
|
}
|
|
|
|
dtv->modules[module_idx] = mod_ptr;
|
|
|
|
}
|
|
|
|
|
|
|
|
return static_cast<char*>(mod_ptr) + ti->offset;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Returns the address of a thread's TLS memory given a module ID and an offset
|
|
|
|
// into that module's TLS segment. This function is called on every access to a
|
|
|
|
// dynamic TLS variable on targets that don't use TLSDESC. arm64 uses TLSDESC,
|
|
|
|
// so it only calls this function on a thread's first access to a module's TLS
|
|
|
|
// segment.
|
|
|
|
//
|
|
|
|
// On most targets, this accessor function is __tls_get_addr and
|
|
|
|
// TLS_GET_ADDR_CCONV is unset. 32-bit x86 uses ___tls_get_addr instead and a
|
|
|
|
// regparm() calling convention.
|
|
|
|
extern "C" void* TLS_GET_ADDR(const TlsIndex* ti) TLS_GET_ADDR_CCONV {
|
|
|
|
TlsDtv* dtv = __get_tcb_dtv(__get_bionic_tcb());
|
|
|
|
|
|
|
|
// TODO: See if we can use a relaxed memory ordering here instead.
|
|
|
|
size_t generation = atomic_load(&__libc_tls_generation_copy);
|
|
|
|
if (__predict_true(generation == dtv->generation)) {
|
|
|
|
void* mod_ptr = dtv->modules[__tls_module_id_to_idx(ti->module_id)];
|
|
|
|
if (__predict_true(mod_ptr != nullptr)) {
|
|
|
|
return static_cast<char*>(mod_ptr) + ti->offset;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return tls_get_addr_slow_path(ti);
|
|
|
|
}
|
|
|
|
|
|
|
|
// This function frees:
|
|
|
|
// - TLS modules referenced by the current DTV.
|
|
|
|
// - The list of DTV objects associated with the current thread.
|
|
|
|
//
|
|
|
|
// The caller must have already blocked signals.
|
|
|
|
void __free_dynamic_tls(bionic_tcb* tcb) {
|
|
|
|
TlsModules& modules = __libc_shared_globals()->tls_modules;
|
|
|
|
BionicAllocator& allocator = __libc_shared_globals()->tls_allocator;
|
|
|
|
|
|
|
|
// If we didn't allocate any dynamic memory, skip out early without taking
|
|
|
|
// the lock.
|
|
|
|
TlsDtv* dtv = __get_tcb_dtv(tcb);
|
|
|
|
if (dtv->generation == kTlsGenerationNone) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
// We need the write lock to use the allocator.
|
|
|
|
ScopedWriteLock locker(&modules.rwlock);
|
|
|
|
|
|
|
|
// First free everything in the current DTV.
|
|
|
|
for (size_t i = 0; i < dtv->count; ++i) {
|
|
|
|
if (i < modules.module_count && modules.module_table[i].static_offset != SIZE_MAX) {
|
|
|
|
// This module's TLS memory is allocated statically, so don't free it here.
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
allocator.free(dtv->modules[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Now free the thread's list of DTVs.
|
|
|
|
while (dtv->generation != kTlsGenerationNone) {
|
|
|
|
TlsDtv* next = dtv->next;
|
|
|
|
allocator.free(dtv);
|
|
|
|
dtv = next;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Clear the DTV slot. The DTV must not be used again with this thread.
|
|
|
|
tcb->tls_slot(TLS_SLOT_DTV) = nullptr;
|
|
|
|
}
|