This memcpy code uses NEON/VFP to achieve very good performance
on ARMv7-A processors. It is specifically tuned for A15 but should
provide good performance on A9 also. It is equivalent to the code
in cortex-strings rev 116.
This patch is a follow up the existing gerrit change:
I7f6f77995f3ca903ad9c66d14261441667a2a935
But this version includes a tweak for performance on misaligned
buffers.
Change-Id: I285abac0068f8ae29a1cbf7862ea8590aadaf0a7
Signed-off-by: Will Newton <will.newton@linaro.org>
For some reason, socketcalls.c was only being compiled for ARM, where
it makes no sense. For x86 we generate stubs for the socket functions
that use __NR_socketcall directly.
Change-Id: I84181e6183fae2314ae3ed862276eba82ad21e8e
<sys/linux-syscalls.h> only contains constants for the syscalls
we're generating stubs for. We want all the syscalls available
on the architecture in question.
Keep using <sys/linux-syscalls.h> on ARM for now because the
__NR_ARM_set_tls and __NR_ARM_cacheflush values aren't in <asm/unistd.h>.
Change-Id: I66683950d87d9b18d6107d0acc0ed238a4496f44
We only need one logging API, and I prefer the one that does no
allocation and is thus safe to use in any context.
Also use O_CLOEXEC when opening the /dev/log files.
Move everything logging-related into one header file.
Change-Id: Ic1e3ea8e9b910dc29df351bff6c0aa4db26fbb58
The attached patch provides a new implementation of strcmp for ARM,
using LDRD instead of LDR whenever possible.
For older architectures that do not support LDRD, this implementation
uses the same algorithm as before.
Testing and benchmarking:
* Validation: successfully passes a test that compares different strings
of length 1-128 and offsets 0-8 from a word boundary. Checked on
qemu/A15/A9, ARM/Thumb mode, Big/Little Endian.
* Integration with gcc: no regression on qemu for arm-none-eabi --with-cpu
a15/a9 --with-mode arm/thumb.
Change-Id: I9e230e1b99dbdc9119b69ee858a89038c516a4ea
Signed-off-by: Vassilis Laganakos <vasileios.laganakos@arm.com>
The strategy for large block sizes is LDRD and STRD with offset addressing,
where the main loop copies 64 bytes in every iteration, (i.e., 8 calls to
LDRD and STRD pairs), interleaving load and stores (i.e., the pairs of LDRD
and STRD of the same data are consecutive instructions), and the writeback
of an updated address is a separate instruction, which allows us to write
back the accumulated update once per iteration.
This strategy is implemented in memcpy.S. In some configurations, a plain
version of memcpy (included from memcpy-stub.c) is used instead of the
optimized one.
Validation:
* Correctness: checked memcpy using a test harness for block sizes
ranging between 1 to 128, and source and destination buffers alignment
ranging in { 0,1,2,3,4,8,12 } bytes each.
* Performance: benchmarking on Cortex-A15 FPGA indicates that this strategy
is better for A15 than the strategy used by glibc and even slightly better
than using NEON. Benchmarking on Cortex-A9 bare metal and Linux shows
that the proposed strategy is reasonable: not as fast as the version of
memcpy from glibc (which is the best open source strategy for A9), but
comparable with csl and bionic.
* Integration with GCC: no regression for arm-none-eabi --with-cpu
cortex-a15 and cortex-a9.
Change-Id: Ied56354d8992c62ae3e02d582a2bd55585d814b9
Signed-off-by: Vassilis Laganakos <vasileios.laganakos@arm.com>
Fix the pthread_setname_np test to take into account that emulator kernels are
so old that they don't support setting the name of other threads.
The CLONE_DETACHED thread is obsolete since 2.5 kernels.
Rename kernel_id to tid.
Fix the signature of __pthread_clone.
Clean up the clone and pthread_setname_np implementations slightly.
Change-Id: I16c2ff8845b67530544bbda9aa6618058603066d
If r0 == 0, we're the child. If r0 > 0, we're the parent.
Otherwise set errno.
The __bionic_clone code I copy & pasted was wrong. This patch
fixes both.
Bug: 3461078
Change-Id: Ibb7d6cc7e54e666841f2f0dc59a141a0b31982e4
MIPS and x86 appear to have been correct already.
(Also fix unit tests that ASSERT_EQ with errno so that the
arguments are in the retarded junit order.)
Bug: 3461078
Change-Id: I2418ea98927b56e15b4ba9cfec97f5e7094c6291
There's now only one place where we deal with this stuff, it only needs to
be parsed once by the dynamic linker (rather than by each recipient), and it's
now easier for us to get hold of auxv data early on.
Change-Id: I6314224257c736547aac2e2a650e66f2ea53bef5
We had two copies of the backtrace code, and two copies of the
libcorkscrew /proc/pid/maps code. This patch gets us down to one.
We also had hacks so we could log in the malloc debugging code.
This patch pulls the non-allocating "printf" code out of the
dynamic linker so everyone can share.
This patch also makes the leak diagnostics easier to read, and
makes it possible to paste them directly into the 'stack' tool (by
using relative PCs).
This patch also fixes the stdio standard stream leak that was
causing a leak warning every time tf_daemon ran.
Bug: 7291287
Change-Id: I66e4083ac2c5606c8d2737cb45c8ac8a32c7cfe8
If the platform code is compiled with -mcpu=cortex-a15, then without this
change prebuilt libraries built against -march=armv7 cannot resolve the
dependency on __aeabi_idiv (provided by libgcc.a).
Bug: 7961327
cherry-picked from internal master.
Change-Id: I8fe59a98eb53d641518b882523c1d6a724fb7e55
Adds new code to function memset, optimized for Cortex A9.
Copyright (C) ST-Ericsson SA 2010
Added neon implementation
Author: Henrik Smiding henrik.smiding@stericsson.com for ST-Ericsson.
Change-Id: Id3c87767953439269040e15bd30a27aba709aef6
Signed-off-by: Christian Bejram <christian.bejram@stericsson.com>
Adds new code to memcpy function, optimized for Cortex A9.
Adds new ARM-only loop, for operations where source and
destination are aligned.
Copyright (C) ST-Ericsson SA 2010
Modified neon implementation to fit Cortex A9 cache line size,
for those running 32 bytes L2 cache line size.
Also split the implementation in aligned and unaligned access,
for those that allows unaligned memory access with Neon.
For totally aligned operations, arm-only code is used.
Change-Id: I95ebf6164cd6486b12a7e3e98e369db21e7e18d2
Author: Henrik Smiding henrik.smiding@stericsson.com for ST-Ericsson.
Signed-off-by: Christian Bejram <christian.bejram@stericsson.com>
Legacy ARM shared libraries use this generic version of atexit(),
which queues exit functions for invocation at program exit, at
which time the library may have been dlclose()'d, causing the
program to crash.
Change-Id: I41ae153c23268daa65ede7fb8966fc3e9caec369
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@gmail.com>
To properly support legacy ARM shared libraries, libc.so needs
to export the symbols __dso_handle and atexit, even though
these are now supplied by the crt startup code.
This patch reshuffles the existing CRT_LEGACY_WORKAROUND
conditionally compiled code slightly so it works as the
original author likely intended.
Change-Id: Id6c0e94dc65b7928324a5f0bad7eba6eb2f464b9
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@gmail.com>
The runtime linker parses the ELF section headers to
discover the size of the init_array and fini_array, so
there is no point in putting NULL terminators at the end.
Change-Id: I3246cd585efce9314155600277dd829e9f37d04f
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@gmail.com>
None of the supported ARCHs actually populate these sections,
so there is no point in keeping them in the binaries.
Change-Id: I21a364f510118ac1114e1b49c53ec8c895c6bc6b
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@gmail.com>
Useful if you're trying to defeat ASLR, otherwise not
so much ...
Change-Id: I17ebb50bb490a3967db9c3038f049adafe2b8ea7
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@gmail.com>
This header is used on bionic build and should be propagated into
sysroot on toolchain rebuild. Discussion re. this header is here:
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00936.html
It is available already in mips NDK platforms:
development/ndk/platforms/android-9/arch-mips/include/link.h
Change-Id: I39ff467cdac9f448e31c11ee3e14a6200e82ab57
Signed-off-by: Pavel Chupin <pavel.v.chupin@intel.com>
Add a GNU_STACK marker to crtend* files. This tells the linker
that these files do not require an executable stack.
When linking, a missing GNU_STACK marker in any .o file can prevent
the compiler from automatically marking the final executable as NX
safe (executable stack not required). In Android, we normally work
around this by adding -Wa,--noexecstack / -Wl,-z,noexecstack.
For files like crtend.S / crtend_so.S, which are included in every
executable / shared library, it's better to add the GNU_STACK note
directly to the assembly file. This allows the compiler to
automatically mark the final executable as NX safe without any
special command line options.
References: http://www.gentoo.org/proj/en/hardened/gnu-stack.xml
Change-Id: I07bd058f9f60ddd8b146e0fb36ba26ff84c0357d
Move the stackpointer so a captured signal does not corrupt
stack variables needed for __thread_entry.
Change-Id: I3e1e7b94a6d7cd3a07081f849043262743aa8064
Rewrite
crtbegin.S -> crtbegin.c
crtbegin_so.S -> crtbegin_so.c
This change allows us to generate PIC code without relying
on text relocations.
As a consequence of this rewrite, also rewrite
__dso_handle.S -> __dso_handle.c
__dso_handle_so.S -> __dso_handle_so.c
atexit.S -> atexit.c
In crtbegin.c _start, place the __PREINIT_ARRAY__, __INIT_ARRAY__,
__FINI_ARRAY__, and __CTOR_LIST__ variables onto the stack, instead of
passing a pointer to the text section of the binary.
This change appears sorta wonky, as I attempted to preserve,
as much as possible, the structure of the original assembly.
As a result, you have C files including other C files, and other
programming uglyness.
Result: This change reduces the number of files with text-relocations
from 315 to 19 on my Android build.
Before:
$ scanelf -aR $OUT/system | grep TEXTREL | wc -l
315
After:
$ scanelf -aR $OUT/system | grep TEXTREL | wc -l
19
Change-Id: Ib9f98107c0eeabcb606e1ddc7ed7fc4eba01c9c4
crtbegin_dynamic and crtbegin_static are essentially identical,
minus a few trivial differences (comments and whitespace).
Eliminate duplicates.
Change-Id: Ic9fae6bc9695004974493b53bfc07cd3bb904480
Adds new code to function memcmp, optimized for Cortex A9.
Copyright (C) ST-Ericsson SA 2010
Added neon optimization
Change-Id: I8864d277042db40778b33232feddd90a02a27fb0
Author: Henrik Smiding henrik.smiding@stericsson.com for ST-Ericsson.
Signed-off-by: Christian Bejram <christian.bejram@stericsson.com>
Modify the dynamic linker so that executables can be loaded
at locations other than 0x00000000.
Modify crtbegin* so that non-PIC compilant "thumb interwork
veneers" are not created by the linker.
Bug: 5323301
Change-Id: Iece0272e2b708c79034f302c20160e1fe9029588
Some SoCs that support NEON nevertheless perform better with a non-NEON than a
NEON memcpy(). This patch adds build variable ARCH_ARM_USE_NON_NEON_MEMCPY,
which can be set in BoardConfig.mk. When ARCH_ARM_USE_NON_NEON_MEMCPY is
defined, we compile in the non-NEON optimized memcpy() even if the SoC supports
NEON.
Change-Id: Ia0e5bee6bad5880ffc5ff8f34a1382d567546cf9
So that we can always get the full stack trace regardless of gcc's handling
of the "noreturn" attribute associated with abort().
(Cherry pick of Id264a5167e7cabbf11515fbc48f5469c527e34d4.)
Bug: 6455193
Conflicts:
libc/Android.mk
Change-Id: I568fc5303fd1d747075ca933355f914122f94dac
So that we can always get the full stack trace regardless of gcc's handling
of the "noreturn" attribute associated with abort().
[cherry-picked from master]
BUG:6455193
Change-Id: I0102355f5bf20e636d3feab9d1424495f38e39e2
ARM Cortex A8 use 64 bytes and ARM Cortex A9 use 32 bytes cache line
size.
The following patch:
Adds code to adjust memcpy cache line size to match A9 cache line
size.
Adds a flag to select between 32 bytes and 64 bytes cache line
size.
Copyright (C) ST-Ericsson SA 2010
Modified neon implementation to fit Cortex A9 cache line size
Author: Henrik Smiding henrik.smiding@stericsson.com for
ST-Ericsson.
Change-Id: I8a55946bfb074e6ec0a14805ed65f73fcd0984a3
Signed-off-by: Christian Bejram <christian.bejram@stericsson.com>
The END macro was put too far down which made the linker complain about
it. Move up to the end of the code.
Change-Id: Ica71a9c6083b437d2213c7cefe34b0083c78f16b
Marking segments read-only was pushing the alignment of __on_dlclose by
2 bytes making it unaligned. This change makes sure the ARM code is
aligned to the 4 byte boundary.
Bug: 6313309
Change-Id: Ic2bf475e120dd61225ec19e5d8a9a8b1d0b7f081
This change fixes a segmentation fault in the libc unwinder when it goes
past __libc_init.
Unwind instructions for __libc_init direct it to grab the return address from
the stack frame. Without this change, the unwinder gets a wild address and
looks up further unwind instructions for the routine at that address. If it's
unlucky enough to hit an existing function, it will try to unwind it. Bad
things happen then.
With this change, the return address always points to the _start function,
which does not have unwind instructions associated with it. This stop the
unwind process.
__libc_init never returns, so this does not affect program execution, other
than adding 4 bytes on the main thread stack.
Change-Id: Id58612172e8825c8729cccd081541a13bff96bd0
Use the same pattern in atexit.S to reference __dso_handle in a way that
doesn't require a TEXTREL flag to be set.
Change-Id: Id69d20863ee203d2b2f7ef0db230f9b548657741
Some platform libraries built for ICS do not work with master
because of some refactoring in frameworks/base.
Make sure that these libgcc symbols are always present in our libc
Change-Id: Ib8d345878be0ba711f051082a778f5cc1f1b3a19
Signed-off-by: Dima Zavin <dima@android.com>
The function must be named __atomic_cmpxchg, not __android_cmpxchg.
This typo broke existing prebuilt binaries (they couldn't be loaded
at runtime anymore).
Change-Id: I25ca7d18329817f0056e616a0409113269ad7b1f
Use tgkill instead of tkill to implement pthread_kill.
This is safer in the event that the thread has already terminated
and its id has been reused by a different process.
Change-Id: Ied715e11d7eadeceead79f33db5e2b5722954ac9
__atomic_cmpxchg and other related atomic operations did not
provide memory barriers, which can be a problem for non-platform
code that links against them when it runs on multi-core devices.
This patch does two things to fix this:
- It modifies the existing implementation of the functions
that are exported by the C library to always provide
full memory barriers. We need to keep them exported by
the C library to prevent breaking existing application
machine code.
- It also modifies <sys/atomics.h> to only export
always-inlined versions of the functions, to ensure that
any application code compiled against the new header will
not rely on the platform version of the functions.
This ensure that said machine code will run properly on
all multi-core devices.
This is based on the GCC built-in sync primitives.
The end result should be only slightly slower than the
previous implementation.
Note that the platform code does not use these functions
at all. A previous patch completely removed their usage in
the pthread and libstdc++ code.
+ rename arch-arm/bionic/atomics_arm.S to futex_arm.S
+ rename arch-x86/bionic/atomics_x86.S to futex_x86.S
+ remove arch-x86/include/sys/atomics.h which already
provided inlined functions to the x86 platform.
Change-Id: I752a594475090cf37fa926bb38209c2175dda539
Modify the dynamic linker so that executables can be loaded
at locations other than 0x00000000.
Modify crtbegin* so that non-PIC compilant "thumb interwork
veneers" are not created by the linker.
Bug: 5323301
Change-Id: Iece0272e2b708c79034f302c20160e1fe9029588
Without this change strcmp size is zero (not set), and it gets
ignored by Valgrind. Changes to memcpy and atexit don't affect the
generated binary in any way.
Change-Id: I05818cb5951f75901dc8c0eef02807a2e83a9231
This patch ensure that __aeabi_f2uiz is embedded in our C library.
This is needed to avoid breaking certain applications when they are
loaded in ICS. It is likely that the issue is due to mis-linked
binaries generated with the stand-alone toolchain (the problem
should not exist if you use ndk-build), but this fix is easier
than asking all app developers to fix their custom build system.
If you want more technical details, read the comments inside
libgcc_compat.c
Change-Id: I59ac1fc781ecb70b90b5573c5a3c67560ca8f270
Unfortunately, legacy .so files for ARM don't have a correct crtbegin file.
Consequently, we have to grandfather the old __dso_handle behaviour.
Add some ifdefs for ARM to allow it to use the old code until we can work
out a transition.
Change-Id: I6a28f368267d792c94e1d985d8344023bc632f6f
Author: H.J. Lu <hongjiu.lu@intel.com>
Signed-off-by: Bruce Beare <bruce.j.beare@intel.com>
Reference results of the experiments on TI OMAP3430 at 600 MHz
$ bench_strcmp -N "strcmp_1k" -s 1k -I 200
[original C code]
prc thr usecs/call samples errors cnt/samp size
strcmp_1k 1 1 10.38000 102 0 15000 1024
[ARM optimized code]
prc thr usecs/call samples errors cnt/samp size
strcmp_1k 1 1 3.08840 88 0 15000 1024
The work was derived from ARM Ltd, contributed to newlib, and reworked
for Android by Linaro.
Change-Id: Ib0d5755e1eb9adb07d80ef0252f57a5c4c57a425
Signed-off-by: Jim Huang <jserv@0xlab.org>
Add a macro to annotate function end and start using both ENTRY and END
for each function. This allows valgrind (and presumably other debugging
tools) to use the debug symbols to trace the functions.
Change-Id: I5f09cef8e22fb356eb6f5cee952b031e567599b6
With this patch _and_ an upcoming build/ patch, the destruction
of static C++ objects contained in shared libraries will happen
properly when dlclose() is called.
Note that this change introduces crtbegin_so.S and crtend_so.S which
are currently ignored by the build system.
+ move definition of __dso_handle to the right place
(before that, all shared libraries used the __dso_handle
global variable from the C library).
Note that we keep a 'weak' __dso_handle in aeabi.c to avoid
breaking the build until the next patch to build/core/combo/
appears. We will be able to remove that later.
+ move bionic/aeabi.c to arch-arm/bionic/ (its proper location)
NOTE: The NDK will need to be modified to enable this feature in
the shared libraries that are generated through it.
Change-Id: I99cd801375bbaef0581175893d1aa0943211b9bc
With this patch, _and_ an upcoming build/ patch, the destruction
of static C++ objects contained in shared libraries will happen
properly when dlclose() is called.
Note that this change introduces crtbegin_so.S and crtend_so.S which
are currently ignored by the build system.
+ move definition of __dso_handle to the right place
(before that, all shared libraries used the __dso_handle
global variable from the C library).
Note that we keep a 'weak' __dso_handle in aeabi.c to avoid
breaking the build until the next patch to build/core/combo/
appears. We will be able to remove that later.
+ move bionic/aeabi.c to arch-arm/bionic/ (its proper location)
Change-Id: Ie771aa204e3acbdf02fd30ebd4150373a1398f39
NOTE: The NDK will need to be modified to enable this feature in
the shared libraries that are generated through it.
Update ARM atomic ops to use LDREX/STREX. Stripped out #if 0 chunk.
Insert explicit memory barriers in pthread and semaphore code.
For bug 2721865.
Change-Id: I0f153b797753a655702d8be41679273d1d5d6ae7
Added an underscore to _ARM_HAVE_LDREX_STREX to make it match the others.
Added __ARM_HAVE_DMB and __ARM_HAVE_LDREXD when appropriate.
Fixed some typos.
Change-Id: I2f55febcff4aeb7de572a514fb2cd2f820dca27c
GDB looks for specific opcode sequences when trying to recognize a stack
frame as a signal trampoline. The sequences it looks for happen to be those
created when SA_RESTORER is set, since glibc always sets a restorer. This
patch does the same here, so that the trampolines can be correctly identified.
Change-Id: I0ac574a68818cb24d939c3527f3aaeb04b853d04
This does not change the implementation of conditional variables
since we're waiting for other system components to properly use
pthread_condattr_init/setpshared before that.
Also remove an obsolete x86 source file.
Change-Id: Ia3e3fbac35b87a534fb04d4381c3c66b975bc8f7
Private futexes are a recent kernel addition: faster futexes that cannot be
shared between processes. This patch uses them by default, unless the PROCESS_SHARED
attribute flag is used when creating a mutex and/or conditional variable.
Also introduces pthread_condattr_init/destroy/setpshared/getpshared.
Change-Id: I3a0e2116f467072b046524cb5babc00e41057a53
Only provide an implementation for ARM at the moment, since
it requires specific assembly fragments (the standard syscall
stubs cannot be used because the child returns in a different
stack).
Merge commit '76ef331cd6967ca8f5af779d25c8b634f8cdd2b6' into eclair-mr2-plus-aosp
* commit '76ef331cd6967ca8f5af779d25c8b634f8cdd2b6':
use local symbols in memset so it doesn't screw up profiling
Merge commit '7e7d6c48a064af82f0ec39f47b9eb803a6e1df4c' into eclair-plus-aosp
* commit '7e7d6c48a064af82f0ec39f47b9eb803a6e1df4c':
use local symbols in memset so it doesn't screw up profiling
Do not submit this patch before the one that modifies the Android emulator to
work-around a weird ARMv7 emulation issue. This is done to temporarily re-allow
the -user builds needed for QA.
This is required to work-around some corny bugs in ARMv7 emulation.
The emulation itself is required to run the dex pre-optimization pass
for -user builds.
Merge commit '7a9e06fa7e4e533074cde314f25dff3024f34a5d' into eclair-plus-aosp
* commit '7a9e06fa7e4e533074cde314f25dff3024f34a5d':
Fix ABI breakage in libc.so and libm.so between 1.6 and Eclair.
372 MB/s for large transfers, 440 MB/s for smaller ones down to 1KB. 130 MB/s for very small transfers ( < 32 bytes )
Performance is similar with non-congruent buffers.
ARMv6 onwards. These architectures provide the load-linked, store-conditional pair of ldrex/strex whose use
is recommended in place of 'swp'. Also, the description of the 'swp' instruction in the ARMv6 reference
manual states that the swap operation does not include any memory barrier guarantees.This fix attempts to
address these issues by providing an atomic swap implementation using ldrex/strex under _ARM_HAVE_LDREX_STREX
macro. This Fix is verified on ST Ericsson's U8500 platform and Submitted on behalf of a third-party:
Surinder-pal SINGH from STMicroelectronics.
The problem was due to the fact that, in the case of dynamic executables,
the dynamic linker calls the DT_PREINIT_ARRAY, DT_INIT and DT_INIT_ARRAY
constructors when loading shared libraries and dynamic executables,
*before* calling the executable's entry point (i.e. arch-$ARCH/bionic/crtbegin_dynamic.c)
which in turns call __libc_init() in libc.so, as defined by bionic/libc_init_dynamic.c
The latter did call these constructors array again, mistakenly.
The patch also updates the documentation of many related functions.
Also adds a new section to linker/README.TXT explaining restrictions on
C library usage.
The patch has been tested on a Dream for stability issues with
proprietary blobs:
- H264 decoding works
- Camera + Video recording works
- GPS works
- Sensors work
The tests in system/extra/tests/bionic/libc/common/test_static_cpp_mutex.cpp has been
run and shows the static C++ constructor being called only once.