Commit graph

33 commits

Author SHA1 Message Date
Chih-Hung Hsieh
5cb4d9cef9 Fix "deprecated instruction in IT block" warning
Bug: 179266557
Test: make with new clang compiler
Change-Id: I963609861659cbb2be8ed467654109938185c747
2021-02-03 20:54:52 -08:00
Kyle Repinski
78da73adad libc: Optimize ARM memcmp by using NEON.
Because NEON_UNALIGNED_ACCESS has never been defined, it has gone unused.
This change enables NEON optimization if __ARM_NEON__ is defined.

Test: bionic-benchmarks-32 BM_string_memcmp

On Nextbit Robin (MSM8992), here are the results

Before:
                                       iterations      ns/op
BM_string_memcmp/8                            50M         28    0.277 GiB/s
BM_string_memcmp/64                           50M         54    1.169 GiB/s
BM_string_memcmp/512                           5M        444    1.151 GiB/s
BM_string_memcmp/1024                          2M        885    1.156 GiB/s
BM_string_memcmp/8Ki                         200k       7401    1.107 GiB/s
BM_string_memcmp/16Ki                        200k      14469    1.132 GiB/s
BM_string_memcmp/32Ki                        100k      28726    1.141 GiB/s
BM_string_memcmp/64Ki                         50k      57480    1.140 GiB/s

After:
                                       iterations      ns/op
BM_string_memcmp/8                            50M         22    0.351 GiB/s
BM_string_memcmp/64                         1000k         17    3.688 GiB/s
BM_string_memcmp/512                          20M        105    4.848 GiB/s
BM_string_memcmp/1024                         10M        190    5.367 GiB/s
BM_string_memcmp/8Ki                        1000k       1496    5.475 GiB/s
BM_string_memcmp/16Ki                       1000k       2746    5.966 GiB/s
BM_string_memcmp/32Ki                        500k       5481    5.978 GiB/s
BM_string_memcmp/64Ki                        200k      10971    5.973 GiB/s

Change-Id: I3c76ce7fa2796872e0171d5502b0ebd6e2893339
2018-12-22 01:04:27 +00:00
Haibo Huang
ea9957a72a Arm32 dynamic function dispatch
Previous change was reverted in 9690b121e3.
This change added .arch directive to kryo/ to avoid invalid instruction error.

Test: Run bionic unit test.
Test: Use gdb to make sure the right function is selected.
Test: Build previously failed target: make PRODUCT-sdk_phone_arm64-sdk
Change-Id: I14de41851121fc1a0b38c98fda5eb844b6a9695c
2018-11-26 20:00:55 +00:00
Izabela Orlowska
9690b121e3 Revert "Arm32 dynamic function dispatch"
This reverts commit ce4ff9c44d.

Reason for revert: broke master in ab/5138164 target sdk_phone_armv7-sdk

Change-Id: Ia4b0c7e6117a37df694509078116963f41d7865e
2018-11-19 14:14:41 +00:00
Haibo Huang
ce4ff9c44d Arm32 dynamic function dispatch
Test: Run bionic unit test.
Test: Use gdb to make sure the right function is selected.

Change-Id: I34ccd83d472c13993f75672b1aac2b2eae65c499
2018-11-16 19:50:17 +00:00
Haibo Huang
8a0f0ed5e7 Make memcpy memmove
Bug: http://b/63992911
Test: Change BoardConfig.mk and compile for each variant
Change-Id: Ia0cc68d8e90e3316ddb2e9ff1555a009b6a0c5be
2018-06-11 18:12:45 +00:00
Elliott Hughes
da46caee09 Add generic arm non-neon memmove.
From OpenBSD.

Bug: http://b/63992911
Test: ran tests
Change-Id: If7d9166922776cdc9333ff04205f9c6312a812b3
2018-05-24 14:57:15 -07:00
George Burgess IV
9024235005 Remove __overloadable/__RENAME_CLANG
Now that we have a clang that supports transparent overloads, we can
kill all of this cruft, and restore our upstream sources to their
untouched glory. Woohoo!

Bug: 12231437
Test: Built aosp_marlin; no obvious patch-related aosp_mips issues.
Change-Id: I520a19d014f12137f80e43f973dccd6711c571cd
2018-02-06 13:35:56 -08:00
Isaac Chen
b890336756 Merge "Build support for 32-bit armv8-a" 2017-08-17 07:19:33 +00:00
Isaac Chen
d544fb7da0 Build support for 32-bit armv8-a
The assembly in arm's generic strlen implementation contains deprecated
instruction sequence:
    it      eq  ; start of IT block
    ldreq   ... ; 32-bit T32 insruction in IT block deprecated in armv8
This will cause compiler error because of -Winline-asm and -Werror.

The fix here is to change the sequence:
    it      eq
    ldreq   ...
    bne     1f
to equivalent sequence:
    bne     1f
    ldr     ...
The resulted sequence is (1 instruction) shorter.

See ARM for ARMv8 for details:
F6.2 Partial Deprecation of IT
... All uses of IT that apply to instructions other than a single
subsequent 16-bit instruction from a restricted set are deprecated, ...

Bug: 62895439
Test: "bionic-unit-tests-static --gtest_filter=*strlen*" on Nexus 4
      (krait), emulator (armv7), and sailfish (armv8).
      The test binary for the first 2 is built with armv7-a as its
      TARGET_CPU_ARCH; The test binary for the last is built with
      armv8-a as its TARGET_2ND_CPU_ARCH.
      TARGET(_2ND)_CPU_VARIANTs of both binaries are set to "generic".

Change-Id: Ia2208b4e2ba2cad589975bc7b4617688cbb8826a
2017-08-14 06:05:20 +00:00
George Burgess IV
d34b0a946c libc: Move FORTIFY into one file; make style fixups
This addresses post-commit feedback from
I88c39ca166bacde0b692aa3063e743bb046a5d2f. With this, our FORTIFY impl
now sits in one file.

Bug: 12231437
Test: mma; no new CtsBionicTestCases failures on bullhead internal
master.
Change-Id: I6f9ff81c3e86cf9d6a0efa650eb5765f1e2fa09c
2017-07-25 17:39:21 -07:00
George Burgess IV
6cb0687932 Split our FORTIFY implementation into libc_fortify
As requested in the bug. This also rips __memcpy_chk out of memcpy.S,
which lets us cut down on copypasta (all of the implementations look
identical).

Bug: 12231437
Test: mma on aosp_{arm,arm64,mips,x86,x86_64} internal master;
checkbuild on bullhead internal master; CtsBionicTestCases on bullhead.
No new failures.
Change-Id: I88c39ca166bacde0b692aa3063e743bb046a5d2f
2017-07-24 14:20:16 -07:00
George Burgess IV
7cc779f15c libc: add clang FORTIFY support
This patch adds clang-style FORTIFY to Bionic. For more information on
FORTIFY, please see https://goo.gl/8HS2dW . This implementation works
for versions of clang that don't support diagnose_if, so please see the
"without diagnose_if" sections. We plan to swap to a diagnose_if-based
FORTIFY later this year (since it doesn't really add any features; it
just simplifies the implementation a lot, and it gives us much prettier
diagnostics)

Bug: 32073964
Test: Builds on angler, bullhead, marlin, sailfish. Bionic CTS tests
pass on Angler and Bullhead.

Change-Id: I607aecbeee81529709b1eee7bef5b0836151eb2b
2017-02-09 15:49:32 -08:00
Elliott Hughes
382bd666e2 Stop including <machine/cpu-features.h>.
We're not looking at __ARM_ARCH__, because we don't support ARMv6.

Bug: http://b/18556103
Change-Id: I91fe096af697dc842a57e97515312e3530743678
2016-05-16 17:52:40 -07:00
Elliott Hughes
01d5b946ac Remove optimized code for bzero, which was removed from POSIX in 2008.
I'll come back for the last bcopy remnant...

Bug: http://b/26407170
Change-Id: Iabfeb95fc8a4b4b3992e3cc209ec5221040e7c26
2016-03-02 17:21:07 -08:00
Elliott Hughes
3c6016f04a Improve diagnostics from the assembler __memcpy_chk routines.
Change-Id: Iec16c92ed80beee505cba2121ea33e3550197b02
2016-03-01 14:45:58 -08:00
Elliott Hughes
62e59646f8 Improve diagnostics from the assembler __memset_chk routines.
Change-Id: Ic165043ab8cd5e16866b3e11cfba960514cbdc57
2016-03-01 12:46:47 -08:00
Elliott Hughes
b83d6747fa Improve FORTIFY failure diagnostics.
Our FORTIFY _chk functions' implementations were very repetitive and verbose
but not very helpful. We'd also screwed up and put the SSIZE_MAX checks where
they would never fire unless you actually had a buffer as large as half your
address space, which probably doesn't happen very often.

Factor out the duplication and take the opportunity to actually show details
like how big the overrun buffer was, or by how much it was overrun.

Also remove the obsolete FORTIFY event logging.

Also remove the unused __libc_fatal_no_abort.

This change doesn't improve the diagnostics from the optimized assembler
implementations.

Change-Id: I176a90701395404d50975b547a00bd2c654e1252
2016-02-26 22:06:17 -08:00
Christopher Ferris
e1e434af12 Replace bx lr with update of pc from the stack.
When there is arm assembler of this format:

ldmxx sp!, {..., lr} or pop {..., lr}
bx lr

It can be replaced with:

ldmxx sp!, {..., pc} or pop {..., pc}

Change-Id: Ic27048c52f90ac4360ad525daf0361a830dc22a3
2015-07-08 11:20:27 -07:00
Chih-Hung Hsieh
33f33515b5 Use unified syntax to compile with both llvm and gcc.
All arch-arm and arch-arm64 .S files were compiled
by gcc with and without this patch. The output object files
were identical. When compiled with llvm and this patch,
the output files were also identical to gcc's output.

BUG: 18061004
Change-Id: I458914d512ddf5496e4eb3d288bf032cd526d32b
2015-05-11 17:15:03 -07:00
Elliott Hughes
1ef6ec40e1 Move the generic arm memcmp.S into the generic directory.
Change-Id: I48e4d14a0dcddbb246edbac6d0329619574ab44d
2014-12-15 11:06:34 -08:00
Christopher Ferris
7123d4371a Fix generic __memcpy_chk implementation.
- Clean up the labels (add .L to make them local).
- Change to using cfi directives.
- Fix unwinding of the __memcpy_chk fail path.

Bug: 18033671
Change-Id: I12845f10c7ce5e6699c15c558bda64c83f6a392a
2014-10-17 14:44:36 -07:00
Elliott Hughes
851e68a240 Unify our assembler macros.
Our <machine/asm.h> files were modified from upstream, to the extent
that no architecture was actually using the upstream ENTRY or END macros,
assuming that architecture even had such a macro upstream. This patch moves
everyone to the same macros, with just a few tweaks remaining in the
<machine/asm.h> files, which no one should now use directly.

I've removed most of the unused cruft from the <machine/asm.h> files, though
there's still rather a lot in the mips/mips64 ones.

Bug: 12229603
Change-Id: I2fff287dc571ac1087abe9070362fb9420d85d6d
2014-02-20 13:51:26 -08:00
Elliott Hughes
c54ca40aef Clean up some ARMv4/ARMv5 cruft.
Change-Id: I29e836fea4b53901e29f96c6888869c35f6726be
2013-12-13 14:02:30 -08:00
Elliott Hughes
68b67113a4 'Avoid confusing "read prevented write" log messages' 2.
This time it's assembler.

Change-Id: Iae6369833b8046b8eda70238bb4ed0cae64269ea
2013-10-15 17:17:05 -07:00
Elliott Hughes
eb847bc866 Fix x86_64 build, clean up intermediate libraries.
The x86_64 build was failing because clone.S had a call to __thread_entry which
was being added to a different intermediate .a on the way to making libc.so,
and the linker couldn't guarantee statically that such a relocation would be
possible.

  ld: error: out/target/product/generic_x86_64/obj/STATIC_LIBRARIES/libc_common_intermediates/libc_common.a(clone.o): requires dynamic R_X86_64_PC32 reloc against '__thread_entry' which may overflow at runtime; recompile with -fPIC

This patch addresses that by ensuring that the caller and callee end up in the
same intermediate .a. While I'm here, I've tried to clean up some of the mess
that led to this situation too. In particular, this removes libc/private/ from
the default include path (except for the DNS code), and splits out the DNS
code into its own library (since it's a weird special case of upstream NetBSD
code that's diverged so heavily it's unlikely ever to get back in sync).

There's more cleanup of the DNS situation possible, but this is definitely a
step in the right direction, and it's more than enough to get x86_64 building
cleanly.

Change-Id: I00425a7245b7a2573df16cc38798187d0729e7c4
2013-10-09 16:00:17 -07:00
Nick Kralevich
6861c6f85e Make error messages even better!
Change-Id: I72bd1eb1d526dc59833e5bc3c636171f7f9545af
2013-10-04 11:43:30 -07:00
Christopher Ferris
59a13c122e Optimize __memset_chk, __memcpy_chk. DO NOT MERGE.
This change creates assembler versions of __memcpy_chk/__memset_chk
that is implemented in the memcpy/memset assembler code. This change
avoids an extra call to memcpy/memset, instead allowing a simple fall
through to occur from the chk code into the body of the real
implementation.

Testing:

- Ran the libc_test on __memcpy_chk/__memset_chk on all nexus devices.
- Wrote a small test executable that has three calls to __memcpy_chk and
  three calls to __memset_chk. First call dest_len is length + 1. Second
  call dest_len is length. Third call dest_len is length - 1.
  Verified that the first two calls pass, and the third fails. Examined
  the logcat output on all nexus devices to verify that the fortify
  error message was sent properly.
- I benchmarked the new __memcpy_chk and __memset_chk on all systems. For
  __memcpy_chk and large copies, the savings is relatively small (about 1%).
  For small copies, the savings is large on cortex-a15/krait devices
  (between 5% to 30%).
  For cortex-a9 and small copies, the speed up is present, but relatively
  small (about 3% to 5%).
  For __memset_chk and large copies, the savings is also small (about 1%).
  However, all processors show larger speed-ups on small copies (about 30% to
  100%).

Bug: 9293744

Merge from internal master.

(cherry-picked from 7c860db074)

Change-Id: I916ad305e4001269460ca6ebd38aaa0be8ac7f52
2013-08-14 18:14:43 -07:00
Christopher Ferris
4e24dcc8d8 Optimize strcat/strcpy, small tweaks to strlen. DO NOT MERGE
Create one version of strcat/strcpy/strlen for cortex-a15/krait and another
version for cortex-a9.

Tested with the libc_test strcat/strcpy/strlen tests.
Including new tests that verify that the src for strcat/strcpy do not
overread across page boundaries.

NOTE: The handling of unaligned strcpy (same code in strcat) could probably
be optimized further such that the src is read 64 bits at a time instead of
the partial reads occurring now.

strlen improves slightly since it was recently optimized.

Performance improvements for strcpy and strcat (using an empty dest string):

cortex-a9
- Small copies vary from about 5% to 20% as the size gets above 10 bytes.
- Copies >= 1024, about a 60% improvement.
- Unaligned copies, from about 40% improvement.

cortex-a15
- Most small copies exhibit a 100% improvement, a few copies only
  improve by 20%.
- Copies >= 1024, about 150% improvement.
- Unaligned copies, about 100% improvement.

krait
- Most small copies vary widely, but on average 20% improvement, then
  the performance gets better, hitting about a 100% improvement when
  copies 64 bytes of data.
- Copies >= 1024, about 100% improvement.
- When coping MBs of data, about 50% improvement.
- Unaligned copies, about 90% improvement.

As strcat destination strings get larger in size:

cortex-a9
- about 40% improvement for small dst strings (>= 32).
- about 250% improvement for dst strings >= 1024.

cortex-a15
- about 200% improvement for small dst strings (>=32).
- about 250% improvement for dst strings >= 1024.

krait
- about 25% improvement for small dst strings (>=32).
- about 100% improvement for dst strings >=1024.

Merge from internal master.

(cherry-picked from d119b7b6f4)

Change-Id: I296463b251ef9fab004ee4dded2793feca5b547a
2013-08-08 11:13:46 -07:00
Christopher Ferris
9ad2a73ed6 Fix assembler errors in generic arm strlen.c.
Tested using a static version of the strlen libc_test program
on a nexus7 that uses the generic code.

Merge from internal master.

(cherry-picked from d8d10a8994)

Change-Id: I88f7dc01dc5b5c3ac2d5580d92153bc1bc36c564
2013-07-16 16:47:54 -07:00
Christopher Ferris
0aa9b52efa Add new optimized strlen for arm.
This optimized version is primarily targeted at cortex-a15.

Tested on all nexus devices using the system/extras/libc_test strlen test.
Tested alignments from 1 to 32 that are powers of 2.
Tested that strlen does not cross page boundaries at all alignments.

Speed improvements listed below:

cortex-a15
- Sizes >= 32 bytes, ~75% improvement.
- Sizes >= 1024 bytes, ~250% improvement.

cortex-a9
- Sizes >= 32 bytes, ~75% improvement.
- Sizes >= 1024 bytes, ~85% improvement.

krait
- Sizes >= 32 bytes, ~95% improvement.
- Sizes >= 1024 bytes, ~160% improvement.

Merge from internal master.

(cherry-picked from 2fc0717977)

Change-Id: I1ceceb4e745fd68e9d946f96d1d42e0cdaff6ccf
2013-07-16 16:47:37 -07:00
Christopher Ferris
31dea25b8b Create arch specific versions of strcmp.
This uses the new strcmp.a15.S code as the basis for new versions
of strcmp.S.

The cortex-a15 code is the performance optimized version of strcmp.a15.S
taken with only the addition of a few pld instructions.
The cortex-a9 code is the same as the cortex-a15 code except that the
unaligned strcmp code was taken from the original strcmp.S.
The krait code is the same as the cortex-a15 code except that one path
in the unaligned strcmp code was taken from the original strcmp.S code
(the 2 byte overlap case).
The generic code is the original unmodified strmp.S from the bionic
subdirectory.

All three new versions underwent these test cases:

Strings the same, all same size:
- Both pointers double word aligned.
- One pointer double word aligned, one pointer word aligned.
- Both pointers word aligned.
- One pointer double word aligned, one pointer 1 off a word alignment.
- One pointer double word aligned, one pointer 2 off a word alignment.
- One pointer double word aligned, one pointer 3 off a word alignment.
- One pointer word aligned, one pointer 1 off a word alignment.
- One pointer word aligned, one pointer 2 off a word alignment.
- One pointer word aligned, one pointer 3 off a word alignment.
For all cases where it made sense, the two pointers were also tested
swapped.

Different strings, all same size:
- Single difference at double word boundary.
- Single difference at word boudary.
- Single difference at 1 off a word alignment.
- Single difference at 2 off a word alignment.
- Single difference at 3 off a word alignment.

Different sized strings, strings the same until the end:
- Shorter string ends on a double word boundary.
- Shorter string ends on word boundary.
- Shorter string ends at 1 off a word boundary.
- Shorter string ends at 2 off a word boundary.
- Shorter string ends at 3 off a word boundary.

For all different cases, run them through the same pointer alignment
cases when the strings are the same size.
For all cases the two pointers were also tested swapped.

Bug: 8005082

Merge from internal master.

(cherry-picked from commit a9a5870d16)

Change-Id: I4c2b98f8a50804fb98ab67f75e9d660f1315a144
2013-03-20 14:33:54 -07:00
Christopher Ferris
04954a43b3 Break bionic implementations into arch versions.
Move arch specific code for arm, mips, x86 into separate
makefiles.
In addition, add different arm cpu versions of memcpy/memset.

Bug: 8005082

Merge from internal master (acdde8c1cf).

Change-Id: I04f3d0715104fab618e1abf7cf8f7eec9bec79df
2013-03-12 14:06:08 -07:00