Taking into account possibility that external symbol
could have been an OBJECT instead of function.
b/14090368
Change-Id: Iac173d2dd1309ed53024306578137c26b1dbbf15
Our <machine/asm.h> files were modified from upstream, to the extent
that no architecture was actually using the upstream ENTRY or END macros,
assuming that architecture even had such a macro upstream. This patch moves
everyone to the same macros, with just a few tweaks remaining in the
<machine/asm.h> files, which no one should now use directly.
I've removed most of the unused cruft from the <machine/asm.h> files, though
there's still rather a lot in the mips/mips64 ones.
Bug: 12229603
Change-Id: I2fff287dc571ac1087abe9070362fb9420d85d6d
This is required to make the Nexus 10 graphics driver work on a system
compiled with gcc 4.9.
Change-Id: If3f3d488652a736d9ea3e583548d74fae3ffa902
Signed-off-by: Bernhard Rosenkränzer <Bernhard.Rosenkranzer@linaro.org>
Also make the other architectures more similar to one another,
use NULL instead of 0 in calling code, and remove an unused #define.
Change-Id: I52b874afb6a351c802f201a0625e484df6d093bb
This is needed if we use Clang to compile Bionic, which won't include
__popcountsi2 anymore as Clang generates inline instructions. However
prebuilt binary blobs still depend on libc.so to resolve __popcountsi2.
Change-Id: I9001a3884c4be250c0ceebcd79922783fae1a0b7
Since the ENTRY/END macros now have .cfi_startproc/.cfi_endproc, most of the
custom arm assembly has no unwind information. Adding the proper cfi directives
for these and removing the arm directives.
Update the gensyscalls.py script to add these cfi directives for the generated
assembly. Also fix the references to non-uapi headers to the proper uapi
header.
In addition, remove the kill.S, tkill.S, tgkill.S for arm since they are not
needed at all. The unwinder (libunwind) is able to properly unwind using the
normal abort.
After this change, I can unwind through the system calls again.
Bug: 11559337
Bug: 11825869
Bug: 11321283
Change-Id: I18b48089ef2d000a67913ce6febc6544bbe934a3
The kernel now maintains the pthread_internal_t::tid field for us,
and __clone was only used in one place so let's inline it so we don't
have to leave such a dangerous function lying around. Also rename
files to match their content and remove some useless #includes.
Change-Id: I24299fb4a940e394de75f864ee36fdabbd9438f9
We only need it for MAX_ERRNO, and it's time we had somewhere to put
the little assembler utility macros we've been putting off writing.
Change-Id: I9354d2e0dc47c689296a34b5b229fc9ba75f1a83
The x86_64 build was failing because clone.S had a call to __thread_entry which
was being added to a different intermediate .a on the way to making libc.so,
and the linker couldn't guarantee statically that such a relocation would be
possible.
ld: error: out/target/product/generic_x86_64/obj/STATIC_LIBRARIES/libc_common_intermediates/libc_common.a(clone.o): requires dynamic R_X86_64_PC32 reloc against '__thread_entry' which may overflow at runtime; recompile with -fPIC
This patch addresses that by ensuring that the caller and callee end up in the
same intermediate .a. While I'm here, I've tried to clean up some of the mess
that led to this situation too. In particular, this removes libc/private/ from
the default include path (except for the DNS code), and splits out the DNS
code into its own library (since it's a weird special case of upstream NetBSD
code that's diverged so heavily it's unlikely ever to get back in sync).
There's more cleanup of the DNS situation possible, but this is definitely a
step in the right direction, and it's more than enough to get x86_64 building
cleanly.
Change-Id: I00425a7245b7a2573df16cc38798187d0729e7c4
We shouldn't have been passing the bottom 32 bits of the address used
for pthread_join to the kernel.
Change-Id: I487e5002d60c27adba51173719213abbee0f183f
memcpy.a15.S/strcmp.a15.S files were submitted by ARM for use as the basis
for the memcpy/strcmp implementations in cortex-a15.
memset.S was moved in to the generic directory.
NOTE: memcpy.a9.S was submitted by Linaro to be the basis for the memcpy
for cortex-a9/cortex-a15 but has not been incorporated yet.
Bug: 10971279
Merge from internal master.
(cherry-picked from 48fc3e8b9f)
Change-Id: I8f9297578990d517f004e4e8840e2b2cbd5a47d8
Create one version of strcat/strcpy/strlen for cortex-a15/krait and another
version for cortex-a9.
Tested with the libc_test strcat/strcpy/strlen tests.
Including new tests that verify that the src for strcat/strcpy do not
overread across page boundaries.
NOTE: The handling of unaligned strcpy (same code in strcat) could probably
be optimized further such that the src is read 64 bits at a time instead of
the partial reads occurring now.
strlen improves slightly since it was recently optimized.
Performance improvements for strcpy and strcat (using an empty dest string):
cortex-a9
- Small copies vary from about 5% to 20% as the size gets above 10 bytes.
- Copies >= 1024, about a 60% improvement.
- Unaligned copies, from about 40% improvement.
cortex-a15
- Most small copies exhibit a 100% improvement, a few copies only
improve by 20%.
- Copies >= 1024, about 150% improvement.
- Unaligned copies, about 100% improvement.
krait
- Most small copies vary widely, but on average 20% improvement, then
the performance gets better, hitting about a 100% improvement when
copies 64 bytes of data.
- Copies >= 1024, about 100% improvement.
- When coping MBs of data, about 50% improvement.
- Unaligned copies, about 90% improvement.
As strcat destination strings get larger in size:
cortex-a9
- about 40% improvement for small dst strings (>= 32).
- about 250% improvement for dst strings >= 1024.
cortex-a15
- about 200% improvement for small dst strings (>=32).
- about 250% improvement for dst strings >= 1024.
krait
- about 25% improvement for small dst strings (>=32).
- about 100% improvement for dst strings >=1024.
Merge from internal master.
(cherry-picked from d119b7b6f4)
Change-Id: I296463b251ef9fab004ee4dded2793feca5b547a
This is needed when passing -mcpu=cortex-a9 or higher on a modern
toolchain for prebuilt library compatibility
Change-Id: I73eb2393377914ae26216a8c2828ad973d1c1225
This optimized version is primarily targeted at cortex-a15.
Tested on all nexus devices using the system/extras/libc_test strlen test.
Tested alignments from 1 to 32 that are powers of 2.
Tested that strlen does not cross page boundaries at all alignments.
Speed improvements listed below:
cortex-a15
- Sizes >= 32 bytes, ~75% improvement.
- Sizes >= 1024 bytes, ~250% improvement.
cortex-a9
- Sizes >= 32 bytes, ~75% improvement.
- Sizes >= 1024 bytes, ~85% improvement.
krait
- Sizes >= 32 bytes, ~95% improvement.
- Sizes >= 1024 bytes, ~160% improvement.
Merge from internal master.
(cherry-picked from 2fc0717977)
Change-Id: I1ceceb4e745fd68e9d946f96d1d42e0cdaff6ccf
We cleaned up the auto-generated ones a while back to not touch
the stack unnecessarily if they have <= 4 arguments. This patch
cleans up some hand-crafted ones.
Also improve comments in clone.S.
Change-Id: I8850bf98f2b26829385315304472a760e6880ed8
This memcpy code uses NEON/VFP to achieve very good performance
on ARMv7-A processors. It is specifically tuned for A15 but should
provide good performance on A9 also. It is equivalent to the code
in cortex-strings rev 116.
This patch is a follow up the existing gerrit change:
I7f6f77995f3ca903ad9c66d14261441667a2a935
This version includes a tweak for performance on misaligned
buffers and splits the header comment into license and
documentation sections.
Change-Id: Ibd2e23c8d8e01357ba0247be1d05192de3ceba69
Signed-off-by: Will Newton <will.newton@linaro.org>
This memcpy code uses NEON/VFP to achieve very good performance
on ARMv7-A processors. It is specifically tuned for A15 but should
provide good performance on A9 also. It is equivalent to the code
in cortex-strings rev 116.
This patch is a follow up the existing gerrit change:
I7f6f77995f3ca903ad9c66d14261441667a2a935
But this version includes a tweak for performance on misaligned
buffers.
Change-Id: I285abac0068f8ae29a1cbf7862ea8590aadaf0a7
Signed-off-by: Will Newton <will.newton@linaro.org>
For some reason, socketcalls.c was only being compiled for ARM, where
it makes no sense. For x86 we generate stubs for the socket functions
that use __NR_socketcall directly.
Change-Id: I84181e6183fae2314ae3ed862276eba82ad21e8e
<sys/linux-syscalls.h> only contains constants for the syscalls
we're generating stubs for. We want all the syscalls available
on the architecture in question.
Keep using <sys/linux-syscalls.h> on ARM for now because the
__NR_ARM_set_tls and __NR_ARM_cacheflush values aren't in <asm/unistd.h>.
Change-Id: I66683950d87d9b18d6107d0acc0ed238a4496f44
We only need one logging API, and I prefer the one that does no
allocation and is thus safe to use in any context.
Also use O_CLOEXEC when opening the /dev/log files.
Move everything logging-related into one header file.
Change-Id: Ic1e3ea8e9b910dc29df351bff6c0aa4db26fbb58
The attached patch provides a new implementation of strcmp for ARM,
using LDRD instead of LDR whenever possible.
For older architectures that do not support LDRD, this implementation
uses the same algorithm as before.
Testing and benchmarking:
* Validation: successfully passes a test that compares different strings
of length 1-128 and offsets 0-8 from a word boundary. Checked on
qemu/A15/A9, ARM/Thumb mode, Big/Little Endian.
* Integration with gcc: no regression on qemu for arm-none-eabi --with-cpu
a15/a9 --with-mode arm/thumb.
Change-Id: I9e230e1b99dbdc9119b69ee858a89038c516a4ea
Signed-off-by: Vassilis Laganakos <vasileios.laganakos@arm.com>
The strategy for large block sizes is LDRD and STRD with offset addressing,
where the main loop copies 64 bytes in every iteration, (i.e., 8 calls to
LDRD and STRD pairs), interleaving load and stores (i.e., the pairs of LDRD
and STRD of the same data are consecutive instructions), and the writeback
of an updated address is a separate instruction, which allows us to write
back the accumulated update once per iteration.
This strategy is implemented in memcpy.S. In some configurations, a plain
version of memcpy (included from memcpy-stub.c) is used instead of the
optimized one.
Validation:
* Correctness: checked memcpy using a test harness for block sizes
ranging between 1 to 128, and source and destination buffers alignment
ranging in { 0,1,2,3,4,8,12 } bytes each.
* Performance: benchmarking on Cortex-A15 FPGA indicates that this strategy
is better for A15 than the strategy used by glibc and even slightly better
than using NEON. Benchmarking on Cortex-A9 bare metal and Linux shows
that the proposed strategy is reasonable: not as fast as the version of
memcpy from glibc (which is the best open source strategy for A9), but
comparable with csl and bionic.
* Integration with GCC: no regression for arm-none-eabi --with-cpu
cortex-a15 and cortex-a9.
Change-Id: Ied56354d8992c62ae3e02d582a2bd55585d814b9
Signed-off-by: Vassilis Laganakos <vasileios.laganakos@arm.com>
Fix the pthread_setname_np test to take into account that emulator kernels are
so old that they don't support setting the name of other threads.
The CLONE_DETACHED thread is obsolete since 2.5 kernels.
Rename kernel_id to tid.
Fix the signature of __pthread_clone.
Clean up the clone and pthread_setname_np implementations slightly.
Change-Id: I16c2ff8845b67530544bbda9aa6618058603066d
If r0 == 0, we're the child. If r0 > 0, we're the parent.
Otherwise set errno.
The __bionic_clone code I copy & pasted was wrong. This patch
fixes both.
Bug: 3461078
Change-Id: Ibb7d6cc7e54e666841f2f0dc59a141a0b31982e4
MIPS and x86 appear to have been correct already.
(Also fix unit tests that ASSERT_EQ with errno so that the
arguments are in the retarded junit order.)
Bug: 3461078
Change-Id: I2418ea98927b56e15b4ba9cfec97f5e7094c6291
There's now only one place where we deal with this stuff, it only needs to
be parsed once by the dynamic linker (rather than by each recipient), and it's
now easier for us to get hold of auxv data early on.
Change-Id: I6314224257c736547aac2e2a650e66f2ea53bef5
We had two copies of the backtrace code, and two copies of the
libcorkscrew /proc/pid/maps code. This patch gets us down to one.
We also had hacks so we could log in the malloc debugging code.
This patch pulls the non-allocating "printf" code out of the
dynamic linker so everyone can share.
This patch also makes the leak diagnostics easier to read, and
makes it possible to paste them directly into the 'stack' tool (by
using relative PCs).
This patch also fixes the stdio standard stream leak that was
causing a leak warning every time tf_daemon ran.
Bug: 7291287
Change-Id: I66e4083ac2c5606c8d2737cb45c8ac8a32c7cfe8