Commit graph

186 commits

Author SHA1 Message Date
Haibo Huang
ece43e14c9 Use cortex-a53/bionic/memmove.S by default for arm64
cortex-a53/bionic/memmove.S looks like a more optimized version. It
should be used in most cases. It delegates small (<= 96 bytes) moves
to memcpy.

The only exception is denver64. It is using its own memcpy, which
doesn't allow overlap for < 96 bytes copies. Only for this variant we
need generic/bionic/memmove.S.

Benchmark result looks pretty close through (on marlin)

Before: using generic/bionic/memmove.S

-------------------------------------------------------------------
Benchmark                            Time           CPU Iterations
-------------------------------------------------------------------
BM_string_memcpy/8/0/0               6 ns          6 ns  108872005   1.15787GB/s
BM_string_memcpy/64/0/0              7 ns          7 ns  107387438   9.14365GB/s
BM_string_memcpy/512/0/0            21 ns         20 ns   34165353   23.2734GB/s
BM_string_memcpy/1024/0/0           40 ns         39 ns   17766657   24.2346GB/s
BM_string_memcpy/8192/0/0          311 ns        310 ns    2259904   24.6339GB/s
BM_string_memcpy/16384/0/0         616 ns        613 ns    1143027   24.8852GB/s
BM_string_memcpy/32768/0/0        1322 ns       1316 ns     530799   23.1835GB/s
BM_string_memcpy/65536/0/0        2672 ns       2661 ns     229638    22.937GB/s
BM_string_memcpy/131072/0/0       5379 ns       5357 ns     128316    22.788GB/s

After: using cortex-a53/bionic/memmove.S

-------------------------------------------------------------------
Benchmark                            Time           CPU Iterations
-------------------------------------------------------------------
BM_string_memcpy/8/0/0               6 ns          6 ns  116610749   1.24646GB/s
BM_string_memcpy/64/0/0              6 ns          6 ns  115634093   9.84708GB/s
BM_string_memcpy/512/0/0            21 ns         21 ns   34167322   22.8938GB/s
BM_string_memcpy/1024/0/0           39 ns         39 ns   17859445   24.3312GB/s
BM_string_memcpy/8192/0/0          311 ns        310 ns    2260192   24.6325GB/s
BM_string_memcpy/16384/0/0         610 ns        608 ns    1151889   25.0987GB/s
BM_string_memcpy/32768/0/0        1488 ns       1482 ns     532508   20.5988GB/s
BM_string_memcpy/65536/0/0        2421 ns       2411 ns     290502   25.3146GB/s
BM_string_memcpy/131072/0/0       5278 ns       5256 ns     132710   23.2234GB/s

Test: Build and benchmark on marlin
Bug: http://b/63992911
Change-Id: Id85961aca18ba841bcbcfe0d8b162843eab30584
2018-05-30 11:09:19 -07:00
Mark Salyzyn
79249b0897 bionic: add vdso clock_getres
clock_getres() should not be a hot call, nevertheless it is
~6-7 times faster for supported clock ids if it uses
__vdso_clock_getres if available.  There is a 3% performance
penalty for unsupported clock ids via __vdso_clock_getres with
respect to a direct syscall.

[TL;DR]

w/vdso32 kernel patches, locked cores to MAX, little cores only.

BEFORE:

hikey960 vdso (aarch64):

----------------------------------------------------------------------
Benchmark                               Time           CPU Iterations
----------------------------------------------------------------------
BM_time_clock_getres                  126 ns        126 ns    5577874
BM_time_clock_getres_syscall          127 ns        127 ns    5505016
BM_time_clock_getres_REALTIME         126 ns        126 ns    5574682
BM_time_clock_getres_BOOTTIME         126 ns        126 ns    5575237
BM_time_clock_getres_TAI              126 ns        126 ns    5576810
BM_time_clock_getres_unsupported      128 ns        128 ns    5480189

hikey960 vdso32 (aarch32):

----------------------------------------------------------------------
Benchmark                               Time           CPU Iterations
----------------------------------------------------------------------
BM_time_clock_getres                  199 ns        199 ns    3508708
BM_time_clock_getres_syscall          220 ns        220 ns    3184676
BM_time_clock_getres_REALTIME         199 ns        199 ns    3509697
BM_time_clock_getres_BOOTTIME         199 ns        199 ns    3513551
BM_time_clock_getres_TAI              200 ns        199 ns    3512412
BM_time_clock_getres_unsupported      196 ns        196 ns    3575609

x86_64 (glibc):

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
BM_time_clock_getres                 252 ns        252 ns    2370263
BM_time_clock_getres_syscall         215 ns        215 ns    3287497
BM_time_clock_getres_REALTIME        214 ns        214 ns    3294228
BM_time_clock_getres_BOOTTIME        213 ns        213 ns    3277519
BM_time_clock_getres_TAI             213 ns        213 ns    3294991
BM_time_clock_getres_unsupported     206 ns        206 ns    3450654

imx7d_pico IOT nyc (w/arm,cpu-registers-not-fw-configured) (armv7a):
(Virtual Timers)

Benchmark                           Time(ns)    CPU(ns) Iterations
------------------------------------------------------------------
BM_time_clock_getres                      16        345    2000000
BM_time_clock_getres_syscall              16        339    2121212
BM_time_clock_getres_REALTIME             17        350    2058824
BM_time_clock_getres_BOOTTIME             17        345    2000000
BM_time_clock_getres_TAI                  16        350    2000000
BM_time_clock_getres_unsupported          13        284    2500000

AFTER:

hikey960 vdso (aarch64):

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
BM_time_clock_getres                  18 ns         18 ns   37880389
BM_time_clock_getres_syscall         127 ns        127 ns    5520029
BM_time_clock_getres_REALTIME         18 ns         18 ns   37879962
BM_time_clock_getres_BOOTTIME         19 ns         18 ns   37878361
BM_time_clock_getres_TAI             131 ns        131 ns    5368484
BM_time_clock_getres_unsupported      97 ns         97 ns    7182864

hikey960 vdso32 (aarch32):

---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
BM_time_clock_getres                  36 ns         36 ns   19205240
BM_time_clock_getres_syscall         212 ns        212 ns    3297100
BM_time_clock_getres_REALTIME         36 ns         36 ns   19219109
BM_time_clock_getres_BOOTTIME         36 ns         36 ns   19222490
BM_time_clock_getres_TAI             206 ns        206 ns    3402868
BM_time_clock_getres_unsupported     159 ns        159 ns    4409492

imx7d_pico IOT nyc (wo/arm,cpu-registers-not-fw-configured) (armv7a):
(Physical Timers)

Benchmark                           Time(ns)    CPU(ns) Iterations
------------------------------------------------------------------
BM_time_clock_getres                       2         48   14000000
BM_time_clock_getres_syscall              14        335    2058824
BM_time_clock_getres_REALTIME              2         49   14583333
BM_time_clock_getres_BOOTTIME              2         48   14000000
BM_time_clock_getres_TAI                  14        350    2058824
BM_time_clock_getres_unsupported           8        203    3500000

Test: taskset F \
        /data/benchmarktest{64}/bionic-benchmarks/bionic-benchmarks \
        --bionic_xml=vdso.xml --benchmark_filter=BM_time_clock_getres*
Bug: 63737556
Change-Id: I80c0a5106625d76720287f715fcf145d2aad1705
2017-12-07 09:41:48 -08:00
Sebastian Pop
ed9bfc4616 [AArch64] Optimized memcmp
Patch written by Wilco Dijkstra submitted for review to newlib:
https://sourceware.org/ml/newlib/2017/msg00524.html

This is an optimized memcmp for AArch64.  This is a complete rewrite
using a different algorithm.  The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop.  The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.

This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared.  After the first 8 bytes,
align the first input.  This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.

This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.

2017-06-28  Wilco Dijkstra  <wdijkstr@arm.com>

        * bionic/libc/arch-arm64/generic/bionic/memcmp.S (memcmp):
                Rewrite of optimized memcmp.

GLIBC benchtests/bench-memcmp.c performance comparison for Cortex-A53:

Length    1, alignment  1/ 1:        153%
Length    1, alignment  1/ 1:        119%
Length    1, alignment  1/ 1:        154%
Length    2, alignment  2/ 2:        121%
Length    2, alignment  2/ 2:        140%
Length    2, alignment  2/ 2:        121%
Length    3, alignment  3/ 3:        105%
Length    3, alignment  3/ 3:        105%
Length    3, alignment  3/ 3:        105%
Length    4, alignment  4/ 4:        155%
Length    4, alignment  4/ 4:        154%
Length    4, alignment  4/ 4:        161%
Length    5, alignment  5/ 5:        173%
Length    5, alignment  5/ 5:        173%
Length    5, alignment  5/ 5:        173%
Length    6, alignment  6/ 6:        145%
Length    6, alignment  6/ 6:        145%
Length    6, alignment  6/ 6:        145%
Length    7, alignment  7/ 7:        125%
Length    7, alignment  7/ 7:        125%
Length    7, alignment  7/ 7:        125%
Length    8, alignment  8/ 8:        111%
Length    8, alignment  8/ 8:        130%
Length    8, alignment  8/ 8:        124%
Length    9, alignment  9/ 9:        160%
Length    9, alignment  9/ 9:        160%
Length    9, alignment  9/ 9:        150%
Length   10, alignment 10/10:        170%
Length   10, alignment 10/10:        137%
Length   10, alignment 10/10:        150%
Length   11, alignment 11/11:        160%
Length   11, alignment 11/11:        160%
Length   11, alignment 11/11:        160%
Length   12, alignment 12/12:        146%
Length   12, alignment 12/12:        168%
Length   12, alignment 12/12:        156%
Length   13, alignment 13/13:        167%
Length   13, alignment 13/13:        167%
Length   13, alignment 13/13:        173%
Length   14, alignment 14/14:        167%
Length   14, alignment 14/14:        168%
Length   14, alignment 14/14:        168%
Length   15, alignment 15/15:        168%
Length   15, alignment 15/15:        173%
Length   15, alignment 15/15:        173%
Length    1, alignment  0/ 0:        134%
Length    1, alignment  0/ 0:        127%
Length    1, alignment  0/ 0:        119%
Length    2, alignment  0/ 0:        94%
Length    2, alignment  0/ 0:        94%
Length    2, alignment  0/ 0:        106%
Length    3, alignment  0/ 0:        82%
Length    3, alignment  0/ 0:        87%
Length    3, alignment  0/ 0:        82%
Length    4, alignment  0/ 0:        115%
Length    4, alignment  0/ 0:        115%
Length    4, alignment  0/ 0:        122%
Length    5, alignment  0/ 0:        127%
Length    5, alignment  0/ 0:        119%
Length    5, alignment  0/ 0:        127%
Length    6, alignment  0/ 0:        103%
Length    6, alignment  0/ 0:        100%
Length    6, alignment  0/ 0:        100%
Length    7, alignment  0/ 0:        82%
Length    7, alignment  0/ 0:        91%
Length    7, alignment  0/ 0:        87%
Length    8, alignment  0/ 0:        111%
Length    8, alignment  0/ 0:        124%
Length    8, alignment  0/ 0:        124%
Length    9, alignment  0/ 0:        136%
Length    9, alignment  0/ 0:        136%
Length    9, alignment  0/ 0:        136%
Length   10, alignment  0/ 0:        136%
Length   10, alignment  0/ 0:        135%
Length   10, alignment  0/ 0:        136%
Length   11, alignment  0/ 0:        136%
Length   11, alignment  0/ 0:        136%
Length   11, alignment  0/ 0:        135%
Length   12, alignment  0/ 0:        136%
Length   12, alignment  0/ 0:        136%
Length   12, alignment  0/ 0:        136%
Length   13, alignment  0/ 0:        135%
Length   13, alignment  0/ 0:        136%
Length   13, alignment  0/ 0:        136%
Length   14, alignment  0/ 0:        136%
Length   14, alignment  0/ 0:        136%
Length   14, alignment  0/ 0:        136%
Length   15, alignment  0/ 0:        136%
Length   15, alignment  0/ 0:        136%
Length   15, alignment  0/ 0:        136%
Length    4, alignment  0/ 0:        115%
Length    4, alignment  0/ 0:        115%
Length    4, alignment  0/ 0:        115%
Length   32, alignment  0/ 0:        127%
Length   32, alignment  7/ 2:        395%
Length   32, alignment  0/ 0:        127%
Length   32, alignment  0/ 0:        127%
Length    8, alignment  0/ 0:        111%
Length    8, alignment  0/ 0:        124%
Length    8, alignment  0/ 0:        124%
Length   64, alignment  0/ 0:        128%
Length   64, alignment  6/ 4:        475%
Length   64, alignment  0/ 0:        131%
Length   64, alignment  0/ 0:        134%
Length   16, alignment  0/ 0:        128%
Length   16, alignment  0/ 0:        119%
Length   16, alignment  0/ 0:        128%
Length  128, alignment  0/ 0:        129%
Length  128, alignment  5/ 6:        475%
Length  128, alignment  0/ 0:        130%
Length  128, alignment  0/ 0:        129%
Length   32, alignment  0/ 0:        126%
Length   32, alignment  0/ 0:        126%
Length   32, alignment  0/ 0:        126%
Length  256, alignment  0/ 0:        127%
Length  256, alignment  4/ 8:        545%
Length  256, alignment  0/ 0:        126%
Length  256, alignment  0/ 0:        128%
Length   64, alignment  0/ 0:        171%
Length   64, alignment  0/ 0:        171%
Length   64, alignment  0/ 0:        174%
Length  512, alignment  0/ 0:        126%
Length  512, alignment  3/10:        585%
Length  512, alignment  0/ 0:        126%
Length  512, alignment  0/ 0:        127%
Length  128, alignment  0/ 0:        129%
Length  128, alignment  0/ 0:        128%
Length  128, alignment  0/ 0:        129%
Length 1024, alignment  0/ 0:        125%
Length 1024, alignment  2/12:        611%
Length 1024, alignment  0/ 0:        126%
Length 1024, alignment  0/ 0:        126%
Length  256, alignment  0/ 0:        128%
Length  256, alignment  0/ 0:        127%
Length  256, alignment  0/ 0:        128%
Length 2048, alignment  0/ 0:        125%
Length 2048, alignment  1/14:        625%
Length 2048, alignment  0/ 0:        125%
Length 2048, alignment  0/ 0:        125%
Length  512, alignment  0/ 0:        126%
Length  512, alignment  0/ 0:        127%
Length  512, alignment  0/ 0:        127%
Length 4096, alignment  0/ 0:        125%
Length 4096, alignment  0/16:        125%
Length 4096, alignment  0/ 0:        125%
Length 4096, alignment  0/ 0:        125%
Length 1024, alignment  0/ 0:        126%
Length 1024, alignment  0/ 0:        126%
Length 1024, alignment  0/ 0:        126%
Length 8192, alignment  0/ 0:        125%
Length 8192, alignment 63/18:        636%
Length 8192, alignment  0/ 0:        125%
Length 8192, alignment  0/ 0:        125%
Length   16, alignment  1/ 2:        317%
Length   16, alignment  1/ 2:        317%
Length   16, alignment  1/ 2:        317%
Length   32, alignment  2/ 4:        395%
Length   32, alignment  2/ 4:        395%
Length   32, alignment  2/ 4:        398%
Length   64, alignment  3/ 6:        475%
Length   64, alignment  3/ 6:        475%
Length   64, alignment  3/ 6:        477%
Length  128, alignment  4/ 8:        479%
Length  128, alignment  4/ 8:        479%
Length  128, alignment  4/ 8:        479%
Length  256, alignment  5/10:        543%
Length  256, alignment  5/10:        539%
Length  256, alignment  5/10:        543%
Length  512, alignment  6/12:        585%
Length  512, alignment  6/12:        585%
Length  512, alignment  6/12:        585%
Length 1024, alignment  7/14:        611%
Length 1024, alignment  7/14:        611%
Length 1024, alignment  7/14:        611%

The performance measured on the bionic-benchmarks on a hikey
board with a new benchmark for unaligned memcmp submitted for
review at https://android-review.googlesource.com/414860

The base is with the libc from /system/lib64. The bionic libc
with this patch is in /data.

hikey:/data # export LD_LIBRARY_PATH=/system/lib64
hikey:/data # ./bionic-benchmarks --benchmark_filter=BM_string_memcmp
Run on (8 X 2.4 MHz CPU s)
Benchmark                                Time           CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8                      30 ns         30 ns   22955680    251.07MB/s
BM_string_memcmp/64                     57 ns         57 ns   12349184   1076.99MB/s
BM_string_memcmp/512                   305 ns        305 ns    2297163   1.56496GB/s
BM_string_memcmp/1024                  571 ns        571 ns    1225211   1.66912GB/s
BM_string_memcmp/8k                   4307 ns       4306 ns     162562   1.77177GB/s
BM_string_memcmp/16k                  8676 ns       8675 ns      80676   1.75887GB/s
BM_string_memcmp/32k                 19233 ns      19230 ns      36394   1.58695GB/s
BM_string_memcmp/64k                 36986 ns      36984 ns      18952   1.65029GB/s
BM_string_memcmp_aligned/8             199 ns        199 ns    3519166   38.3336MB/s
BM_string_memcmp_aligned/64            386 ns        386 ns    1810734   158.073MB/s
BM_string_memcmp_aligned/512          1735 ns       1734 ns     403981   281.525MB/s
BM_string_memcmp_aligned/1024         3200 ns       3200 ns     218838   305.151MB/s
BM_string_memcmp_aligned/8k          25084 ns      25080 ns      28180   311.507MB/s
BM_string_memcmp_aligned/16k         51730 ns      51729 ns      13521   302.057MB/s
BM_string_memcmp_aligned/32k        103228 ns     103228 ns       6782   302.727MB/s
BM_string_memcmp_aligned/64k        207117 ns     207087 ns       3450   301.806MB/s
BM_string_memcmp_unaligned/8           339 ns        339 ns    2070998   22.5302MB/s
BM_string_memcmp_unaligned/64         1392 ns       1392 ns     502796   43.8454MB/s
BM_string_memcmp_unaligned/512        9194 ns       9194 ns      76133   53.1104MB/s
BM_string_memcmp_unaligned/1024      18325 ns      18323 ns      38206   53.2963MB/s
BM_string_memcmp_unaligned/8k       148579 ns     148574 ns       4713   52.5831MB/s
BM_string_memcmp_unaligned/16k      298169 ns     298120 ns       2344   52.4118MB/s
BM_string_memcmp_unaligned/32k      598813 ns     598797 ns       1085    52.188MB/s
BM_string_memcmp_unaligned/64k     1196079 ns    1196083 ns        540   52.2539MB/s

hikey:/data # export LD_LIBRARY_PATH=/data
hikey:/data # ./bionic-benchmarks --benchmark_filter=BM_string_memcmp

Benchmark                                Time           CPU Iterations
----------------------------------------------------------------------
BM_string_memcmp/8                      27 ns         27 ns   26198166   286.069MB/s
BM_string_memcmp/64                     45 ns         45 ns   15553753   1.32443GB/s
BM_string_memcmp/512                   242 ns        242 ns    2892423   1.97049GB/s
BM_string_memcmp/1024                  455 ns        455 ns    1537290   2.09436GB/s
BM_string_memcmp/8k                   3446 ns       3446 ns     203295   2.21392GB/s
BM_string_memcmp/16k                  7567 ns       7567 ns      92582   2.01657GB/s
BM_string_memcmp/32k                 16081 ns      16081 ns      43524    1.8977GB/s
BM_string_memcmp/64k                 31029 ns      31028 ns      22565   1.96712GB/s
BM_string_memcmp_aligned/8             184 ns        184 ns    3800912   41.3654MB/s
BM_string_memcmp_aligned/64            287 ns        287 ns    2438835    212.65MB/s
BM_string_memcmp_aligned/512          1370 ns       1370 ns     511014   356.498MB/s
BM_string_memcmp_aligned/1024         2543 ns       2543 ns     275253   384.006MB/s
BM_string_memcmp_aligned/8k          20413 ns      20411 ns      34306   382.764MB/s
BM_string_memcmp_aligned/16k         42908 ns      42907 ns      16132   364.158MB/s
BM_string_memcmp_aligned/32k         88902 ns      88886 ns       8087   351.574MB/s
BM_string_memcmp_aligned/64k        173016 ns     173007 ns       4122   361.258MB/s
BM_string_memcmp_unaligned/8           212 ns        212 ns    3304163   36.0243MB/s
BM_string_memcmp_unaligned/64          361 ns        361 ns    1941597   169.279MB/s
BM_string_memcmp_unaligned/512        1754 ns       1753 ns     399210   278.492MB/s
BM_string_memcmp_unaligned/1024       3308 ns       3308 ns     211622   295.243MB/s
BM_string_memcmp_unaligned/8k        27227 ns      27225 ns      25637   286.964MB/s
BM_string_memcmp_unaligned/16k       55877 ns      55874 ns      12455   279.645MB/s
BM_string_memcmp_unaligned/32k      112397 ns     112366 ns       6200    278.11MB/s
BM_string_memcmp_unaligned/64k      223493 ns     223482 ns       3127   279.665MB/s

Test: bionic-benchmarks --benchmark_filter='BM_string_memcmp*'

Change-Id: Ia16a8cf69c68b8c0533f025f03b925c9883bb708
2017-11-03 13:21:07 -04:00
dimitry
fa432524a6 Mark __BIONIC_WEAK_FOR_NATIVE_BRIDGE symbols
To make it easier for Native Bridge implementations
to override these symbols.

Bug: http://b/67993967
Test: make
Change-Id: I4c53e53af494bca365dd2b3305ab0ccc2b23ba44
2017-10-27 10:01:46 +02:00
Elliott Hughes
d4ca231ae2 Unified sysroot: kill arch-specific include dirs.
<machine/asm.h> was internal use only.

<machine/fenv.h> is quite large, but can live in <bits/...>.

<machine/regdef.h> is trivially replaced by saying $x instead of x in
our assembler.

<machine/setjmp.h> is trivially inlined into <setjmp.h>.

<sgidefs.h> is unused.

Bug: N/A
Test: builds
Change-Id: Id05dbab43a2f9537486efb8f27a5ef167b055815
2017-10-12 13:19:51 -07:00
Elliott Hughes
8465e968a8 Add <sys/random.h>.
iOS 10 has <sys/random.h> with getentropy, glibc >= 2.25 has
<sys/random.h> with getentropy and getrandom. (glibc also pollutes
<unistd.h>, but that seems like a bad idea.)

Also, all supported devices now have kernels with the getrandom system
call.

We've had these available internally for a while, but it seems like the
time is ripe to expose them.

Bug: http://b/67014255
Test: ran tests
Change-Id: I76dde1e3a2d0bc82777eea437ac193f96964f138
2017-09-29 05:31:35 +00:00
Elliott Hughes
896362eb0e Add syncfs(2).
GMM calls this system call directly at the moment. That's silly.

Bug: http://b/36405699
Test: ran tests
Change-Id: I1e14c0e5ce0bc2aa888d884845ac30dc20f13cd5
2017-08-24 16:31:49 -07:00
George Burgess IV
6cb0687932 Split our FORTIFY implementation into libc_fortify
As requested in the bug. This also rips __memcpy_chk out of memcpy.S,
which lets us cut down on copypasta (all of the implementations look
identical).

Bug: 12231437
Test: mma on aosp_{arm,arm64,mips,x86,x86_64} internal master;
checkbuild on bullhead internal master; CtsBionicTestCases on bullhead.
No new failures.
Change-Id: I88c39ca166bacde0b692aa3063e743bb046a5d2f
2017-07-24 14:20:16 -07:00
Elliott Hughes
94072fbb4e Switch to inline assembler in crtbegin.
Using __builtin_frame_address was clever, but didn't work for arm64 (for
reasons which were never investigated) and the ChromeOS folks claim it
causes trouble for x86 with ARC++ (though without a reproduceable test case).

Naked functions turn out to be quite unevenly supported: some architectures
do the right thing, others don't; some architectures warn, others don't (and
the warnings don't always match the platforms that _actually_ have problems).

Inline assembler also removes the guessing games: everyone knows what the
couple of instructions _ought_ to be, and now we don't have to reason about
what the compiler will actually do (yet still keep the majority of the code
in C).

Bug: N/A
Test: builds, boots
Change-Id: I14207ef50ca46b6eca273c3cb7509c311146a3ca
2017-05-23 14:47:16 -07:00
Kevin Brodsky
f19eeb8446 libc: ARM64: fix memset for non-standard ZVA sizes
372f19e9e2 ("libc: ARM64: update memset/strlen/memcpy/memmove to
newlib/cortex-strings") introduced a bug in memset, only occurring
on the [set_long + zero + non-standard ZVA size] path, more
specifically when DCZID_EL0 reports a size different to 64 or 128.

On platforms with such sizes reported by DCZID_EL0, various string*
unit tests fail due to memset zeroing memory before and/or after the
area it is supposed to set.

Test: bionic-unit-tests --gtest_filter=string*
Change-Id: Idb80c0269226e40e343645a58608e3f324378468
2017-05-16 11:29:49 +01:00
Jake Weinstein
28285f5338 libc: clean up ARM64 copyright notices
Test: None needed

Change-Id: I3626a92329e954f67bada6ed73f3033225bbfef5
2017-05-04 12:59:53 -04:00
Elliott Hughes
5109bb463d Make all the ELF relocation constants available.
BSD thinks you should only get the relocation constants for your target
architecture, but it's often useful to have them all available at once.
Rearrange the headers to enable that.

Also update the (modified) NetBSD files to CVS HEAD.

Also remove the unused BSDism R_TYPE.

Bug: N/A
Test: builds
Change-Id: Iad5ef29192a732696e2b36af35144a9ca116aa46
2017-04-19 13:28:32 -07:00
Elliott Hughes
901601b48e Remove unused elf_machdep.h cruft.
Also add a few missing include guards.

Bug: N/A
Test: builds
Change-Id: I9557303c81a4b11d430112528def038ecb5562a9
2017-04-17 16:25:09 -07:00
Yuanyuan Zhong
9d150dd9a0 bionic: arm64: generic: strcmp: align to 64B cache line
Align strcmp to 64B. This will ensure the preformance critical
loop is within one 64B cache line.

Change-Id: I88eef2f12b2a6442cacec9cdbdffbf17293e7d32
Signed-off-by: Yuanyuan Zhong <zyy@motorola.com>
Reviewed-on: https://gerrit.mot.com/902536
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver <sltawvr@motorola.com>
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com>
Reviewed-by: Igor Kovalenko <igork@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
2017-03-20 17:54:29 +00:00
Jake Weinstein
372f19e9e2 libc: ARM64: update memset/strlen/memcpy/memmove to newlib/cortex-strings
* Bionic benchmarks results at the bottom

* This is a squash of the following commits:

libc: ARM64: optimize memset.

 This is an optimized memset for AArch64.  Memset is split into 4 main
 cases: small sets of up to 16 bytes, medium of 16..96 bytes which are
 fully unrolled.  Large memsets of more than 96 bytes align the
 destination and use an unrolled loop processing 64 bytes per
 iteration.  Memsets of zero of more than 256 use the dc zva
 instruction, and there are faster versions for the common ZVA sizes 64
 or 128.  STP of Q registers is used to reduce codesize without loss of
 performance.

Change-Id: I0c5b5ec5ab8a1fd0f23eee8fbacada0be08e841f

libc: ARM64: improve performance in strlen

Change-Id: Ic20f93a0052a49bd76cd6795f51e8606ccfbf11c

libc: ARM64: Optimize memcpy.

 This is an optimized memcpy for AArch64.  Copies are split into 3 main
 cases: small copies of up to 16 bytes, medium copies of 17..96 bytes
 which are fully unrolled.  Large copies of more than 96 bytes align
 the destination and use an unrolled loop processing 64 bytes per
 iteration.  In order to share code with memmove, small and medium
 copies read all data before writing, allowing any kind of overlap.  On
 a random copy test memcpy is 40.8% faster on A57 and 28.4% on A53.

Change-Id: Ibb9483e45bbc0e8ca3d5ce98a31c55dfd8a5ac28

libc: AArch64: Tune memcpy

* Further tuning for performance.

Change-Id: Id08eaab885f9743fa7575077924a947c1b88e4ff

libc: ARM64: optimize memmove for Cortex-A53

* Sadly does not work on Denver or Kryo, so can't go to generic

 This is an optimized memmove for AArch64.  All copies of up to 96
 bytes and all backward copies are done by the new memcpy.  The only
 remaining case is large forward copies which are done in the same way
 as the memcpy loop, but copying from the end rather than the start.

Tested on the Nextbit Robin with MSM8992 (Snapdragon 808):

Before
BM_string_memcmp/8                          1000k         27    0.286 GiB/s
BM_string_memcmp/64                           50M         20    3.053 GiB/s
BM_string_memcmp/512                          20M        126    4.060 GiB/s
BM_string_memcmp/1024                         10M        234    4.372 GiB/s
BM_string_memcmp/8Ki                        1000k       1726    4.745 GiB/s
BM_string_memcmp/16Ki                        500k       3711    4.415 GiB/s
BM_string_memcmp/32Ki                        200k       8276    3.959 GiB/s
BM_string_memcmp/64Ki                        100k      16351    4.008 GiB/s
BM_string_memcpy/8                          1000k         13    0.612 GiB/s
BM_string_memcpy/64                         1000k          8    7.187 GiB/s
BM_string_memcpy/512                          50M         38   13.311 GiB/s
BM_string_memcpy/1024                         20M         86   11.858 GiB/s
BM_string_memcpy/8Ki                           5M        620   13.203 GiB/s
BM_string_memcpy/16Ki                       1000k       1265   12.950 GiB/s
BM_string_memcpy/32Ki                        500k       2977   11.004 GiB/s
BM_string_memcpy/64Ki                        500k       8003    8.188 GiB/s
BM_string_memmove/8                         1000k         11    0.684 GiB/s
BM_string_memmove/64                        1000k         16    3.855 GiB/s
BM_string_memmove/512                         50M         57    8.915 GiB/s
BM_string_memmove/1024                        20M        117    8.720 GiB/s
BM_string_memmove/8Ki                          2M        853    9.594 GiB/s
BM_string_memmove/16Ki                      1000k       1731    9.462 GiB/s
BM_string_memmove/32Ki                       500k       3566    9.189 GiB/s
BM_string_memmove/64Ki                       500k       7708    8.501 GiB/s
BM_string_memset/8                          1000k         16    0.487 GiB/s
BM_string_memset/64                         1000k         16    3.995 GiB/s
BM_string_memset/512                          50M         37   13.489 GiB/s
BM_string_memset/1024                         50M         58   17.405 GiB/s
BM_string_memset/8Ki                           5M        451   18.160 GiB/s
BM_string_memset/16Ki                          2M        883   18.554 GiB/s
BM_string_memset/32Ki                       1000k       2181   15.022 GiB/s
BM_string_memset/64Ki                        500k       4563   14.362 GiB/s
BM_string_strlen/8                          1000k          8    0.965 GiB/s
BM_string_strlen/64                         1000k         16    3.855 GiB/s
BM_string_strlen/512                          20M         92    5.540 GiB/s
BM_string_strlen/1024                         10M        167    6.111 GiB/s
BM_string_strlen/8Ki                        1000k       1237    6.620 GiB/s
BM_string_strlen/16Ki                       1000k       2765    5.923 GiB/s
BM_string_strlen/32Ki                        500k       6135    5.341 GiB/s
BM_string_strlen/64Ki                        200k      13168    4.977 GiB/s

After
BM_string_memcmp/8                          1000k         21    0.369 GiB/s
BM_string_memcmp/64                         1000k         28    2.272 GiB/s
BM_string_memcmp/512                          20M        128    3.983 GiB/s
BM_string_memcmp/1024                         10M        234    4.375 GiB/s
BM_string_memcmp/8Ki                        1000k       1732    4.728 GiB/s
BM_string_memcmp/16Ki                        500k       3485    4.701 GiB/s
BM_string_memcmp/32Ki                        500k       7031    4.660 GiB/s
BM_string_memcmp/64Ki                        200k      14296    4.584 GiB/s
BM_string_memcpy/8                          1000k          5    1.458 GiB/s
BM_string_memcpy/64                         1000k          7    8.952 GiB/s
BM_string_memcpy/512                          50M         36   13.907 GiB/s
BM_string_memcpy/1024                         20M         80   12.750 GiB/s
BM_string_memcpy/8Ki                           5M        572   14.307 GiB/s
BM_string_memcpy/16Ki                       1000k       1165   14.053 GiB/s
BM_string_memcpy/32Ki                        500k       3141   10.430 GiB/s
BM_string_memcpy/64Ki                        500k       7008    9.351 GiB/s
BM_string_memmove/8                           50M          7    1.074 GiB/s
BM_string_memmove/64                        1000k          9    6.593 GiB/s
BM_string_memmove/512                         50M         37   13.502 GiB/s
BM_string_memmove/1024                        20M         80   12.656 GiB/s
BM_string_memmove/8Ki                          5M        573   14.281 GiB/s
BM_string_memmove/16Ki                      1000k       1168   14.018 GiB/s
BM_string_memmove/32Ki                      1000k       2825   11.599 GiB/s
BM_string_memmove/64Ki                       500k       6548   10.008 GiB/s
BM_string_memset/8                          1000k          7    1.038 GiB/s
BM_string_memset/64                         1000k          8    7.151 GiB/s
BM_string_memset/512                        1000k         29   17.272 GiB/s
BM_string_memset/1024                         50M         53   18.969 GiB/s
BM_string_memset/8Ki                           5M        424   19.300 GiB/s
BM_string_memset/16Ki                          2M        846   19.350 GiB/s
BM_string_memset/32Ki                       1000k       2028   16.156 GiB/s
BM_string_memset/64Ki                        500k       4514   14.517 GiB/s
BM_string_strlen/8                          1000k          7    1.120 GiB/s
BM_string_strlen/64                         1000k         16    3.918 GiB/s
BM_string_strlen/512                          50M         64    7.894 GiB/s
BM_string_strlen/1024                         20M        104    9.815 GiB/s
BM_string_strlen/8Ki                           5M        664   12.337 GiB/s
BM_string_strlen/16Ki                       1000k       1291   12.682 GiB/s
BM_string_strlen/32Ki                       1000k       2940   11.143 GiB/s
BM_string_strlen/64Ki                        500k       6440   10.175 GiB/s

Change-Id: I635bd2798a755256f748b2af19b1a56fb85a40c6
2016-11-28 19:35:12 +00:00
Elliott Hughes
beb8796624 Use ENTRY_PRIVATE in __bionic_clone assembler.
Bug: N/A
Test: bionic tests
Change-Id: Ic651d628be009487a36d0b2e5bcf900b981b1ef9
2016-10-26 17:01:58 -07:00
Colin Cross
7510c33b61 Remove deprecated Android.mk files
These directories all have Android.bp files that are always used now,
delete the Android.mk files.

Change-Id: Ib0ba2d28bff88483b505426ba61606da314e03ab
2016-05-26 16:41:57 -07:00
Elliott Hughes
eafad49bd6 Add <sys/quota.h>.
It turns out that at least the Nexus 9 kernel is built without CONFIG_QUOTA.
If we decide we're going to mandate quota functionality, I'm happy for us to
be a part of CTS that ensures that happens, but I don't want to be first, so
there's not much to test here other than "will it compile?". The strace
output looks right though.

Bug: http://b/27948821
Bug: http://b/27952303
Change-Id: If667195eee849ed17c8fa9110f6b02907fc8fc04
2016-04-06 11:06:09 -07:00
Elliott Hughes
7f72ad4d6c Add sync_file_range to <fcntl.h>.
Bug: http://b/27952303
Change-Id: Idadfacd657ed415abc11684b9471e4e24c2fbf05
2016-04-05 12:17:22 -07:00
Elliott Hughes
afe835d540 Move math headers in with the other headers.
Keeping them separate is a pain for the NDK, and doesn't help the platform.

Change-Id: I96b8beef307d4a956e9c0a899ad9315adc502582
2016-04-02 08:36:33 -07:00
Greg Hackmann
e2faf07d65 Add {get,set}domainname(2)
{get,set}domainname aren't in POSIX but are widely-implemented
extensions.

The Linux kernel provides a setdomainname syscall but not a symmetric
getdomainname syscall, since it expects userspace to get the domain name
from uname(2).

Change-Id: I96726c242f4bb646c130b361688328b0b97269a0
Signed-off-by: Greg Hackmann <ghackmann@google.com>
2016-03-25 14:16:58 -07:00
Josh Gao
0c3655a864 Add a checksum to jmp_buf on AArch64.
Bug: http://b/27417786
Change-Id: I17c22dc28a46dd6b678b449b506b0da978f3793e
2016-03-03 12:45:08 -08:00
Elliott Hughes
784609317d Mandate optimized __memset_chk for arm and arm64.
This involves actually implementing assembler __memset_chk for arm64,
but that's easily done.

Obviously I'd like this for all architectures (and all the string functions),
but this is low-hanging fruit...

Change-Id: I70ec48c91aafd1f0feb974a2555c51611de9ef82
2016-03-02 11:58:41 -08:00
Elliott Hughes
3c6016f04a Improve diagnostics from the assembler __memcpy_chk routines.
Change-Id: Iec16c92ed80beee505cba2121ea33e3550197b02
2016-03-01 14:45:58 -08:00
Elliott Hughes
b83d6747fa Improve FORTIFY failure diagnostics.
Our FORTIFY _chk functions' implementations were very repetitive and verbose
but not very helpful. We'd also screwed up and put the SSIZE_MAX checks where
they would never fire unless you actually had a buffer as large as half your
address space, which probably doesn't happen very often.

Factor out the duplication and take the opportunity to actually show details
like how big the overrun buffer was, or by how much it was overrun.

Also remove the obsolete FORTIFY event logging.

Also remove the unused __libc_fatal_no_abort.

This change doesn't improve the diagnostics from the optimized assembler
implementations.

Change-Id: I176a90701395404d50975b547a00bd2c654e1252
2016-02-26 22:06:17 -08:00
Elliott Hughes
5f26c6bc91 Really add adjtimex(2), and add clock_adjtime(2) too.
Change-Id: I81fde2ec9fdf787bb19a784ad13df92d33a4f852
2016-02-03 13:19:10 -08:00
Greg Hackmann
3f3f6c526b Add adjtimex
Change-Id: Ia92d35b1851e73c9f157a749dba1e98f68309a8d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
2016-01-28 13:41:22 -08:00
Elliott Hughes
42d949ff9d Defend against -fstack-protector in libc startup.
Exactly which functions get a stack protector is up to the compiler, so
let's separate the code that sets up the environment stack protection
requires and explicitly build it with -fno-stack-protector.

Bug: http://b/26276517
Change-Id: I8719e23ead1f1e81715c32c1335da868f68369b5
2016-01-06 20:06:08 -08:00
Daniel Micay
4200e260d2 fix the mremap signature
The mremap definition was incorrect (unsigned long instead of int) and
it was missing the optional new_address parameter.

Change-Id: Ib9d0675aaa098c21617cedc9b2b8cf267be3aec4
2015-11-06 13:14:43 -08:00
Dan Willemsen
268a673bd1 Switch to LOCAL_SRC_FILES_EXCLUDE
This moves the generic arm/arm64/x86 settings into the main makefiles
and makes the rest of them derivatives. This better aligns with how
soong handles arch/cpu variants.

Also updates the Android.bp to make it consistent with the make
versions.

Change-Id: I5a0275d992bc657459eb6fe1697ad2336731d122
2015-10-20 11:58:28 -07:00
Josh Gao
54db0df8d6 Implement setjmp cookies on AArch64.
Bug: http://b/23942752
Change-Id: I81408ef0dd53010140b51e3083d357d3f2961112
2015-09-17 14:07:24 -07:00
Elliott Hughes
6f4594d5dc Add preadv/pwritev.
Bug: http://b/12612572
Change-Id: I38ff2684d69bd0fe3f21b1d371b88fa60d5421cb
2015-08-26 14:48:55 -07:00
Jake Weinstein
2926f9a31e libc: remove bcopy from memmove on 64-bit architectures
* bcopy is deprecated on LP64 by the following commit:

  ce9ce28e5d

Change-Id: I6849916f0ec4a2d0db9a360999ad1dc8edda952b
2015-08-17 22:06:12 +00:00
Elliott Hughes
5891abdc66 Invalidate cached pid in vfork.
Bug: http://b/23008979
Change-Id: I1dd900ac988cdbe10aad3abc53240c5d352891d5
2015-08-07 19:44:12 -07:00
Tim Murray
9876aa273d Merge "Add support for cortex-a53 in bionic." 2015-06-16 19:04:14 +00:00
Tim Murray
a73b2c961f Add support for cortex-a53 in bionic.
allows -mcpu=cortex-a53 to be passed as part of a command line.

Change-Id: Id4203a9fd197f4c3b661bad21ac58c32819fd687
2015-06-15 21:43:30 -07:00
Elliott Hughes
b1304935b6 Hide accidentally-exposed __clock_nanosleep.
Bug: http://b/21858067
Change-Id: Iaa83a5e17cfff796aed4f641d0d14427614d9399
2015-06-15 19:39:04 -07:00
Elliott Hughes
be57a40d29 Add process_vm_readv and process_vm_writev.
Bug: http://b/21761353
Change-Id: Ic8ef3f241d62d2a4271fbc783c8af50257bac498
2015-06-10 17:24:20 -07:00
Nick Kralevich
e1d0810cd7 Add O_PATH support for flistxattr()
A continuation of commit 2825f10b7f.

Add O_PATH compatibility support for flistxattr(). This allows
a process to list out all the extended attributes associated with
O_PATH file descriptors.

Change-Id: Ie2285ac7ad2e4eac427ddba6c2d182d41b130f75
2015-06-06 11:25:41 -07:00
Nick Kralevich
2825f10b7f libc: Add O_PATH support for fgetxattr / fsetxattr
Support O_PATH file descriptors when handling fgetxattr and fsetxattr.
This avoids requiring file read access to pull extended attributes.

This is needed to support O_PATH file descriptors when calling
SELinux's fgetfilecon() call. In particular, this allows the querying
and setting of SELinux file context by using something like the following
code:

  int dirfd = open("/path/to/dir", O_DIRECTORY);
  int fd = openat(dirfd, "file", O_PATH | O_NOFOLLOW);
  char *context;
  fgetfilecon(fd, &context);

This change was motivated by a comment in
https://android-review.googlesource.com/#/c/152680/1/toys/posix/ls.c

Change-Id: Ic0cdf9f9dd0e35a63b44a4c4a08400020041eddf
2015-06-01 15:51:56 -07:00
Yabin Cui
40a8f214a5 Hide rt_sigqueueinfo.
Bug: 19358804
Change-Id: I38a53ad64c81d0eefdd1d24599e769fd8a477a56
2015-05-18 11:29:20 -07:00
Chih-Hung Hsieh
33f33515b5 Use unified syntax to compile with both llvm and gcc.
All arch-arm and arch-arm64 .S files were compiled
by gcc with and without this patch. The output object files
were identical. When compiled with llvm and this patch,
the output files were also identical to gcc's output.

BUG: 18061004
Change-Id: I458914d512ddf5496e4eb3d288bf032cd526d32b
2015-05-11 17:15:03 -07:00
Dan Albert
7c2c01d681 Revert "Fix volantis boot."
Bug: http://b/20065774
This reverts commit 76e1cbca75.
2015-05-07 15:12:24 -07:00
Dan Albert
6f0d7005f9 Revert "Fix clang build."
Bug: http://b/20065774
This reverts commit 0975a5d9d2.
2015-05-07 15:12:16 -07:00
Dan Albert
f920f821e2 Revert "Try again to fix clang build."
Bug: http://b/20065774
This reverts commit dffd3c5838.

Change-Id: I5dd095ff4ab133baa2afcbd4c79fbee55d05c459
2015-05-07 15:11:48 -07:00
Dmitriy Ivanov
ea295f68f1 Unregister pthread_atfork handlers on dlclose()
Bug: http://b/20339788
Change-Id: I874c87faa377645fa9e0752f4fc166d81fd9ef7e
2015-04-24 17:57:37 -07:00
Dimitry Ivanov
6c63ee41ac Merge "Revert "Unregister pthread_atfork handlers on dlclose()"" 2015-04-24 03:49:30 +00:00
Dimitry Ivanov
094f58fb2a Revert "Unregister pthread_atfork handlers on dlclose()"
The visibility control in pthread_atfork.h is incorrect.
 It breaks 64bit libc.so by hiding pthread_atfork.

 This reverts commit 6df122f852.

Change-Id: I21e4b344d500c6f6de0ccb7420b916c4e233dd34
2015-04-24 03:46:57 +00:00
Elliott Hughes
3da9373fe0 Merge "Simplify close(2) EINTR handling." 2015-04-23 21:14:25 +00:00
Elliott Hughes
3391a9ff13 Simplify close(2) EINTR handling.
This doesn't affect code like Chrome that correctly ignores EINTR on
close, makes code that tries TEMP_FAILURE_RETRY work (where before it might
have closed a different fd and appeared to succeed, or had a bogus EBADF),
and makes "goto fail" code work (instead of mistakenly assuming that EINTR
means that the close failed).

Who loses? Anyone actively trying to detect that they caught a signal while
in close(2). I don't think those people exist, and I think they have better
alternatives available.

Bug: https://code.google.com/p/chromium/issues/detail?id=269623
Bug: http://b/20501816
Change-Id: I11e2f66532fe5d1b0082b2433212e24bdda8219b
2015-04-23 08:41:45 -07:00