platform_bionic/libc/arch-arm64/generic/bionic
Haibo Huang ece43e14c9 Use cortex-a53/bionic/memmove.S by default for arm64
cortex-a53/bionic/memmove.S looks like a more optimized version. It
should be used in most cases. It delegates small (<= 96 bytes) moves
to memcpy.

The only exception is denver64. It is using its own memcpy, which
doesn't allow overlap for < 96 bytes copies. Only for this variant we
need generic/bionic/memmove.S.

Benchmark result looks pretty close through (on marlin)

Before: using generic/bionic/memmove.S

-------------------------------------------------------------------
Benchmark                            Time           CPU Iterations
-------------------------------------------------------------------
BM_string_memcpy/8/0/0               6 ns          6 ns  108872005   1.15787GB/s
BM_string_memcpy/64/0/0              7 ns          7 ns  107387438   9.14365GB/s
BM_string_memcpy/512/0/0            21 ns         20 ns   34165353   23.2734GB/s
BM_string_memcpy/1024/0/0           40 ns         39 ns   17766657   24.2346GB/s
BM_string_memcpy/8192/0/0          311 ns        310 ns    2259904   24.6339GB/s
BM_string_memcpy/16384/0/0         616 ns        613 ns    1143027   24.8852GB/s
BM_string_memcpy/32768/0/0        1322 ns       1316 ns     530799   23.1835GB/s
BM_string_memcpy/65536/0/0        2672 ns       2661 ns     229638    22.937GB/s
BM_string_memcpy/131072/0/0       5379 ns       5357 ns     128316    22.788GB/s

After: using cortex-a53/bionic/memmove.S

-------------------------------------------------------------------
Benchmark                            Time           CPU Iterations
-------------------------------------------------------------------
BM_string_memcpy/8/0/0               6 ns          6 ns  116610749   1.24646GB/s
BM_string_memcpy/64/0/0              6 ns          6 ns  115634093   9.84708GB/s
BM_string_memcpy/512/0/0            21 ns         21 ns   34167322   22.8938GB/s
BM_string_memcpy/1024/0/0           39 ns         39 ns   17859445   24.3312GB/s
BM_string_memcpy/8192/0/0          311 ns        310 ns    2260192   24.6325GB/s
BM_string_memcpy/16384/0/0         610 ns        608 ns    1151889   25.0987GB/s
BM_string_memcpy/32768/0/0        1488 ns       1482 ns     532508   20.5988GB/s
BM_string_memcpy/65536/0/0        2421 ns       2411 ns     290502   25.3146GB/s
BM_string_memcpy/131072/0/0       5278 ns       5256 ns     132710   23.2234GB/s

Test: Build and benchmark on marlin
Bug: http://b/63992911
Change-Id: Id85961aca18ba841bcbcfe0d8b162843eab30584
2018-05-30 11:09:19 -07:00
..
__memcpy_chk.S Split our FORTIFY implementation into libc_fortify 2017-07-24 14:20:16 -07:00
memchr.S libc: clean up ARM64 copyright notices 2017-05-04 12:59:53 -04:00
memcmp.S [AArch64] Optimized memcmp 2017-11-03 13:21:07 -04:00
memcpy.S Split our FORTIFY implementation into libc_fortify 2017-07-24 14:20:16 -07:00
memcpy_base.S libc: ARM64: update memset/strlen/memcpy/memmove to newlib/cortex-strings 2016-11-28 19:35:12 +00:00
memmove.S Use cortex-a53/bionic/memmove.S by default for arm64 2018-05-30 11:09:19 -07:00
memset.S libc: ARM64: fix memset for non-standard ZVA sizes 2017-05-16 11:29:49 +01:00
stpcpy.S Add optimized stpcpy. 2014-06-30 12:48:13 -07:00
strchr.S libc: clean up ARM64 copyright notices 2017-05-04 12:59:53 -04:00
strcmp.S bionic: arm64: generic: strcmp: align to 64B cache line 2017-03-20 17:54:29 +00:00
strcpy.S Add optimized stpcpy. 2014-06-30 12:48:13 -07:00
string_copy.S Regenerate the bionic NOTICE files. 2014-07-07 15:42:06 -07:00
strlen.S libc: ARM64: update memset/strlen/memcpy/memmove to newlib/cortex-strings 2016-11-28 19:35:12 +00:00
strncmp.S Add ARMv8 optimized string handling functions based on cortex-strings 2014-03-06 14:59:51 -08:00
strnlen.S Add ARMv8 optimized string handling functions based on cortex-strings 2014-03-06 14:59:51 -08:00
wmemmove.S Add optimized AArch64 versions of bcopy and wmemmove based on memmove 2014-05-23 18:49:57 -07:00