Commit graph

235 commits

Author SHA1 Message Date
Elliott Hughes
12773b6eca Merge "Simplify Oryon ifunc resolvers." into main 2024-04-16 15:06:14 +00:00
Elliott Hughes
f978a85cc3 Simplify Oryon ifunc resolvers.
Mainly just factoring out the code, but there are two functional
changes here too:

1. The inline assembler was missing `volatile`, making the hwcap
check ineffective (because the compiler would sometimes move the
MIDR_EL1 read above the hwcap check).

2. The previous code accepted variants 0x0 to 0x5 while the comment
said 0x1 to 0x5. The comment was correct.

I resisted the temptation to actually have a table to search on the assumption that it'll be a while before we need such a thing.

Bug: https://issuetracker.google.com/330105715
Change-Id: I9fdc1e70e49b26ef32794b55ca5e5fd37f1163f9
2024-04-16 15:05:55 +00:00
Elliott Hughes
6937761c52 arm64: use L() in the handful of places we didn't already.
Change-Id: Ieb3cc5c9623291421c1d2fdc204e27812fee8ffd
2024-04-08 16:38:41 +00:00
Vaisakh K V
54a612187d Custom memset implementation for Qualcomm Oryon CPU
Submitted on behalf of a third-party: Linaro Limited

License rights, if any, to the submission are granted solely by the
copyright owner of such submission under its applicable intellectual
property.

Copyright (c) 2012, Linaro Limited
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
  notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
 notice, this list of conditions and the following disclaimer in the
 documentation and/or other materials provided with the distribution.
* Neither the name of the Linaro nor the
 names of its contributors may be used to endorse or promote products
 derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Origin Project URL: https://android.googlesource.com/platform/bionic/
Commit ID: 7e4fa56099

Third Party code includes additions/modifications from Qualcomm Innovation Center, Inc.

Test: All
Change-Id: I479a572a325e27262d27aa37c516618e4322e9bb
2024-03-29 13:35:04 +05:30
Vaisakh K V
83e55841ea Custom memcpy implementation for Qualcomm Oryon CPU
Submitted on behalf of a third-party: Arm Limited

License rights, if any, to the submission are granted solely by the
copyright owner of such submission under its applicable intellectual
property.

Copyright (c) 2012-2022, Arm Limited.
SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception

Origin Project URL: https://github.com/ARM-software/optimized-routines
Tag: v24.01

Third Party code includes additions/modifications from Qualcomm Innovation Center, Inc.

Test: All
Change-Id: I0c97398a435e3f8ddf8ad38bc6bd71cc0d78aea5
2024-03-29 13:25:10 +05:30
Elliott Hughes
cb47a4f671 Use ifuncs for memset and memrchr.
Not useful right now, but Qualcomm has an Oryon memset they'd like to
use, and there's no reason to treat memrchr as a weird special case.

Bug: https://issuetracker.google.com/330105715
Test: treehugger
Change-Id: Id879479bf4f45433debcb3fe08cfa96bb1eb3b93
2024-03-26 18:58:50 +00:00
Florian Mayer
73750dc38e Move memtag_stack out of libc_globals
We cannot use a WriteProtected because we are accessing it in a
multithreaded context.

Test: atest memtag_stack_dlopen_test w/ MTE
Test: atest bionic-unit-tests w/ MTE
Test: atest bionic-unit-tests on _fullmte
Bug: 328256432
Change-Id: I39faa75f97fd5b3fb755a46e88346c17c0e9a8e2
2024-03-12 12:42:23 -07:00
Florian Mayer
0e1412e08e Make memtag_handle_longjmp precise
We would get the SP inside of memtag_handle_longjmp, which could prevent
us from detecting the case where a longjmp is going into a function that
had already returned. This changes makes the behaviour more predictable.

Change-Id: I75bf931c8f4129a2f38001156b7bbe0b54a726ee
2024-03-06 16:46:45 -08:00
Elliott Hughes
d7831208b2 Fix assembler warnings.
clang complains if you define a symbol and _then_ make it weak, rather
than the other way round:

  /tmp/setjmp-c3c977.s:90:1: warning: sigsetjmp changed binding to STB_WEAK
  .weak sigsetjmp;
  ^

Test: treehugger
Change-Id: Iee6b0ea456bb2e92aea810ce45f171caabaa89d2
2024-01-23 22:06:19 +00:00
Elliott Hughes
20f9d67327 Fix the *return* types in the arm64 dynamic function dispatch.
No actual effect on the code, but misleading and wrong. (The previous
change only fixed the argument types; I didn't notice that some of the
return types were wrong too.)

Test: treehugger
Change-Id: I1ee5c48e2652fd8cbf8178d5659e57f79e61898e
2023-05-22 19:28:33 +00:00
Elliott Hughes
a1974064ae Fix the types in the arm64 dynamic function dispatch.
No actually effect on the code, but misleading and wrong.

Test: treehugger
Change-Id: I55405ac224b4dcc2ae515954aed179c1cde3c73c
2023-05-18 13:40:12 -07:00
Peter Collingbourne
b6a592b25b Make fork equivalent to vfork when HWASan or MTE stack tagging is enabled.
Bug: 274056091
Change-Id: Iac029ca6b0e26f57f20c0a54822b75e3cae67344
2023-05-08 15:26:00 -07:00
Elliott Hughes
9a7155dbbd riscv64 SCS support.
Bug: https://github.com/google/android-riscv64/issues/55
Test: treehugger
Change-Id: I05d48a07a302305126942d38529ffa280640c7b7
2023-02-23 01:21:07 +00:00
Elliott Hughes
3d8e98f8bd Add (no-op) ifuncs for SVE optimized routines.
This patch doesn't *enable* the SVE optimized routines, but it does let
us see if switching them to ifuncs will cause any app compat issues, so
that we can more easily use the optimized routines in future.

Test: treehugger
Change-Id: Ic5fe570bd21687da397b48127bf688f7ec68dd0c
2023-01-25 23:33:39 +00:00
Elliott Hughes
5ec0bfae50 Track upstream arm-optimized-routines changes.
The MTE-compatible routines are now faster than the incompatible ones,
so they merged them upstream.

I've left the ifunc boilerplate on the assumption that I'll be back
later to enable the new SVE variants.

Test: treehugger
Change-Id: Ic894bfb350b9aa70e307bca1c4978624b3e5f4fd
2023-01-25 18:12:18 +00:00
Elliott Hughes
023e4e7840 Move to arm-optimized-routines memset().
This one's a bit simpler, because there is only one upstream memset()
implementation.

Test: treehugger
Change-Id: I2536d0eb72adaacfa6a0e40d2bd29fc833988c16
2022-11-17 19:28:06 +00:00
Elliott Hughes
7daf4596b7 Switch to the arm-optimized-routines memcpy() and memmove().
Outsource this to them, and choose the best of the two options available
based on the hardware we're running on.

Test: treehugger
Change-Id: I2fa7555c971b64a6decca132210e901ffa248efa
2022-11-17 00:38:49 +00:00
Treehugger Robot
d26d3c0b5c Merge "Implement __memset_chk as a copy & paste of __memcpy_chk." 2022-11-16 23:33:14 +00:00
Treehugger Robot
6c599e3a67 Merge "Move memcpy_base.S into memcpy.S." 2022-11-16 22:11:53 +00:00
Elliott Hughes
3cc366d3a2 Implement __memset_chk as a copy & paste of __memcpy_chk.
These two will stay behind when we move memcpy()/memmove()/memset() over
to arm-optimized-routines (which leaves fortify to us).

Test: treehugger
Change-Id: Ie683f71a5a141263ce3f4e8811df9eaf667584f4
2022-11-16 21:07:56 +00:00
Elliott Hughes
d5ac40cc9f Move memcpy_base.S into memcpy.S.
Just to make it clear that there's nothing interesting going on here ---
there's just one user, and the only symbol here is __memcpy().

Test: treehugger
Change-Id: I62d72c43c4c6d30442f05c1e08a0cb1a1ec42a8a
2022-11-16 18:50:54 +00:00
Elliott Hughes
0d4d276253 Remove assembler wmemmove().
The compiler turns our C wmemmove() into one shift instruction and a
branch, which is plenty for a function no-one uses anyway.

Why don't I just leave this alone, since we already have it? Because I'm
looking at finishing the project of "switch to arm-optimized-routines"
and getting rid of our assembler here, and Arm agrees that this isn't
worth having optimized assembler for in their optimized assembler
project, judging by its absence.

Test: treehugger
Change-Id: I985801241a8cbd7dbda51a447946affb1402effb
2022-11-16 18:44:56 +00:00
Elliott Hughes
faac8e658c arm64: remove unnecessary duplication of constants in vfork.S.
Test: treehugger
Change-Id: I41fd22bad0581269c88f5b3bb499735ab6ecafd2
2022-10-14 21:36:58 +00:00
Evgenii Stepanov
3031a7e45e memtag_stack: vfork and longjmp support.
With memtag_stack, each function is responsible for cleaning up
allocation tags for its stack frame. Allocation tags for anything below
SP must match the address tag in SP.

Both vfork and longjmp implement non-local control transfer which
abandons part of the stack without proper cleanup. Update allocation
tags:
* For longjmp, we know both source and destination values of SP.
* For vfork, save the value of SP before exit() or exec*() - the only
  valid ways of ending the child process according to POSIX - and reset
  tags from there to SP-in-parent.

This is not 100% solid and can be confused by a number of hopefully
uncommon conditions:
* Segmented stacks.
* Longjmp from sigaltstack into the main stack.
* Some kind of userspace thread implementation using longjmp (that's UB,
  longjmp can only return to the caller on the current stack).
* and other strange things.

This change adds a sanity limit on the size of the tag cleanup. Also,
this logic is only activated in the binaries that carry the
NT_MEMTAG_STACK note (set by -fsanitize=memtag-stack) which is meant as
a debugging configuration, is not compatible with pre-armv9 CPUs, and
should not be set on production code.

Bug: b/174878242
Test: fvp_mini with ToT LLVM (more test in a separate change)

Change-Id: Ibef8b2fc5a6ce85c8e562dead1019964d9f6b80b
2022-05-27 13:19:34 -07:00
Mitch Phillips
93400371f7 [NFCI] Change Android's NT_TYPE to NT_ANDROID_TYPE.
Normally, platform-specific note types in the toolchain are prefixed
with the platform name. Because we're exposing the NT_TYPE_MEMTAG and
synthesizing the note in the toolchain in an upcoming patch
(https://reviews.llvm.org/D118948), it's been requested that we change
the name to include the platform prefix.

While NT_TYPE_IDENT and NT_TYPE_KUSER aren't known about or synthesized
by the toolchain, update those references as well for consistency.

Bug: N/A
Test: Build Android
Change-Id: I7742e4917ae275d59d7984991664ea48028053a1
2022-02-07 13:49:20 -08:00
Elliott Hughes
c0d41db92e setjmp/longjmp: avoid invalid values in the stack pointer.
arm64 was already being careful, but x86/x86-64 and 32-bit ARM could be
caught by a signal in a state where the stack pointer was mangled.

For 32-bit ARM I've taken care with the link register too, to avoid
potential issues with unwinding.

Bug: http://b/152210274
Test: treehugger
Change-Id: I1ce285b017a633c732dbe04743368f4cae27af85
2021-04-05 17:43:36 -07:00
Elliott Hughes
3e1d5563b6 PAC/BTI: no need to keep using hint.
The toolchain is new enough that should be able to use the actual
instructions now...

Test: treehugger
Change-Id: I30aafcdc5386268344c40dc6cc9a22caf591915a
2021-01-25 08:49:01 -08:00
Peter Collingbourne
7e20117a36 Remove ANDROID_EXPERIMENTAL_MTE.
Now that the feature guarded by this flag has landed in Linux 5.10
we no longer need the flag, so we can remove it.

Bug: 135772972
Change-Id: I02fa50848cbd0486c23c8a229bb8f1ab5dd5a56f
2021-01-11 10:55:51 -08:00
Evgenii Stepanov
8564b8d9e6 Use ELF notes to set the desired memory tagging level.
Use a note in executables to specify
(none|sync|async) heap tagging level. To be extended with (heap x stack x
globals) in the future. A missing note disables all tagging.

Bug: b/135772972
Test: bionic-unit-tests (in a future change)

Change-Id: Iab145a922c7abe24cdce17323f9e0c1063cc1321
2021-01-06 16:08:18 -08:00
Tamas Petz
f5bdee7fdf libc: Add Armv8.3-A PAuth and Armv8.5-A BTI compatibility to *.S
The most notable change is in sigsetjmp/siglongjmp. The former
stores LR signed with the current SP into jmp_buf. Calling siglongjmp
reads a signed LR and the corresponding SP from jmp_buf. This way not
only the checksum provides some means of integrity protection but
Pointer Authentication too.

Test: Tested on FVP with BTI enabled.

Change-Id: I9d720239775f8d2829a677901f546c4b14b5cbe5
2020-09-04 11:29:12 +02:00
Peter Collingbourne
2361d4ef80 Adopt remaining MTE string routines.
ARM has released the remaining MTE string routines, so let's start
using them. The strnlen implementation is now compatible with MTE,
so it no longer needs to be an ifunc.

Bug: 135772972
Change-Id: I9de7fb44447aa1b878f4ad3f62cb0129857b43ad
2020-06-11 08:52:26 -07:00
Josh Gao
2303283740 Track whether a thread is currently vforked.
Our various fd debugging facilities get extremely confused by a vforked
process closing file descriptors in preparation to exec: fdsan can
abort, and fdtrack will delete backtraces for any file descriptors that
get closed. Keep track of whether we're in a vforked child in order to
be able to detect this.

Bug: http://b/153926671
Test: 32/64-bit bionic-unit-tests on blueline, x86_64 emulator
Change-Id: I8a082fd06bfdfef0e2a88dbce350b6f667f7df9f
2020-05-07 19:44:27 -07:00
Peter Collingbourne
337a5b3f9a Switch to the arm-optimized-routines string routines on aarch64 where possible.
This includes optimized strrchr and strchrnul routines, and an MTE-compatible
strlen routine.

Bug: 135772972
Change-Id: I48499f757cdc6d3e77e5649123d45b17dfa3c6b0
2020-02-25 13:11:55 -08:00
Peter Collingbourne
900d07d6a1 Add arm64 string.h function implementations for use with hardware supporting MTE.
As it turns out, our "generic" arm64 implementations of certain string.h
functions are not actually generic, since they will eagerly read memory
possibly outside of the bounds of an MTE granule, which may lead to a segfault
on MTE-enabled hardware. Therefore, move the implementations into a "default"
directory and use ifuncs to select between them and a new set of "mte"
implementations, conditional on whether the hardware and kernel support MTE.

The MTE implementations are currently naive implementations written in C
but will later be replaced with a set of optimized assembly implementations.

Bug: 135772972
Change-Id: Ife37c4e0e6fd60ff20a34594cc09c541af4d1dd7
2019-10-29 16:18:31 -07:00
Christopher Ferris
b8a95e2186 Update to kernel headers v5.3.2.
Test: Builds and run unit tests on taimen/cuttlefish.
Change-Id: I6ebd8f179d159ac974555e8edca588083e8081b3
2019-10-03 10:59:32 -07:00
Christopher Ferris
c5d3a4348a Make tls related header files platform accessible.
There are places in frameworks and art code that directly included
private bionic header files. Move these files to the new platform
include files.

This change also moves the __get_tls.h header file to tls.h and includes
the tls defines header so that there is a single header that platform
code can use to get __get_tls and the defines.

Also, simplify the visibility rules for platform includes.

Bug: 141560639

Test: Builds and bionic unit tests pass.
Change-Id: I9e5e9c33fe8a85260f69823468bc9d340ab7a1f9
Merged-In: I9e5e9c33fe8a85260f69823468bc9d340ab7a1f9
(cherry picked from commit 44631c919a)
2019-09-27 12:14:24 -07:00
Elliott Hughes
782c485880 Generate assembler system call stubs via genrule.
There's no need to check in generated code.

Test: builds & boots
Change-Id: Ife368bca4349d4adeb0666db590356196b4fbd63
2019-04-16 12:31:00 -07:00
Elliott Hughes
d67b03734d libc: generate syscall stubs in one big file...
...all the better to switch to a genrule rather than checking in
generated source.

This also removes all the code in the script to deal with git,
rather than fix it. We won't need that where we're going.

Test: boots
Change-Id: I468ce019d4232a7ef27e5cb5cfd89f4c2fe4ecbd
2019-04-16 00:54:11 +00:00
Evgenii Stepanov
505168e530 Annotate vfork for hwasan.
Call a hwasan hook in the parent return path for vfork() to let hwasan
update its shadow. See https://github.com/google/sanitizers/issues/925
for more details.

Bug: 112438058
Test: bionic-unit-tests
Change-Id: I9a06800962913e822bd66e072012d0a2c5be453d
2019-03-19 23:36:44 +00:00
Ryan Prichard
82aea78136 Use TLS_SLOT_THREAD_ID macro in vfork.S
No functional change intended.

Bug: none
Test: bionic unit tests
Change-Id: I7ee0a2b3f0e3807abe88bfa34ef3cd56c150a8f6
2019-01-16 01:11:26 -08:00
Haibo Huang
3927db1d5b Remove denver64 from libc
Test: compile
Change-Id: Ifcbe15c1682b4e1e18835e38915b2421196882f7
2018-11-30 22:28:39 +00:00
Peter Collingbourne
734beec3d4 Allocate a small guard region around the shadow call stack.
This lets us do two things:

1) Make setjmp and longjmp compatible with shadow call stack.
   To avoid leaking the shadow call stack address into memory, only the
   lower log2(SCS_SIZE) bits of x18 are stored to jmp_buf. This requires
   allocating an additional guard page so that we're guaranteed to be
   able to allocate a sufficiently aligned SCS.

2) SCS overflow detection. Overflows now result in a SIGSEGV instead
   of corrupting the allocation that comes after it.

Change-Id: I04d6634f96162bf625684672a87fba8b402b7fd1
Test: bionic-unit-tests
2018-11-16 14:37:08 -08:00
Treehugger Robot
a2a114ba26 Merge "Annotate siglongjmp for HWASan." 2018-09-06 21:35:09 +00:00
Evgenii Stepanov
b16e9ce7b8 Annotate siglongjmp for HWASan.
HWASan needs to re-tag the newly unallocated stack space to match SP.

Bug: 112438058
Test: SANITIZE_TARGET=hwaddress

Change-Id: I4dddef542d802d63bdea59e32a03425a2c4f870b
2018-09-05 13:37:14 -07:00
Evgenii Stepanov
00d087c629 (arm64) Extend branch range in __memcpy_chk.
Conditional branch has limited range (1MB) and can not be extended by
the linker. The current distance (in walleye build) is 500KB, about
half of the maximum. HWASan pushes it over the limit.

Replace conditional branch with regular branch, which has longer
range (26 vs 19 bits offset) and can be extended in the linker if
needed.

Bug: 112437884
Bug: 12231437
Test: SANITIZE_TARGET=hwaddress

Change-Id: Idc083fb557ab3a859541beb009809992406a6703
2018-08-31 15:02:12 -07:00
Adhemerval Zanella
65a6211f4d [AArch64] Improve strncmp for mutually misaligned inputs
This patch was originally written by Siddhesh Poyarekar and pushed on
cortex-strings [1]. The mutually misaligned inputs on aarch64 are
compared with a simple byte copy, which is not very efficient.
This patch enhances the comparison similar to strcmp by loading a
double-word at a time.

Comparison on the default bionic and proposed optimized routines
shows the following performance improvements on A54 (using the
new proposed memcmp input data from test_strncmp.xml):

  - No noticeable change on aligned inputs or with same alignment.

  - Large improvements on unaligned inputs from sizes larger than
    16 bytes.

Benchmark                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------
BM_string_strncmp/1/0/0              -0.0954         -0.0954            19            17            19            17
BM_string_strncmp/2/0/0              -0.0344         -0.0344            19            18            19            18
BM_string_strncmp/3/0/0              +0.1768         +0.1768            15            18            15            18
BM_string_strncmp/4/0/0              -0.0344         -0.0344            19            18            19            18
BM_string_strncmp/5/0/0              -0.0344         -0.0344            19            18            19            18
BM_string_strncmp/6/0/0              +0.1589         +0.1589            15            18            15            18
BM_string_strncmp/7/0/0              -0.0344         -0.0344            19            18            19            18
BM_string_strncmp/8/0/0              -0.0998         -0.0998            19            17            19            17
BM_string_strncmp/9/0/0              -0.0277         -0.0277            23            22            23            22
BM_string_strncmp/10/0/0             -0.0270         -0.0270            23            22            23            22
BM_string_strncmp/11/0/0             -0.0331         -0.0331            23            22            23            22
BM_string_strncmp/12/0/0             -0.0270         -0.0270            23            22            23            22
BM_string_strncmp/13/0/0             -0.0284         -0.0284            23            22            23            22
BM_string_strncmp/14/0/0             +0.1042         +0.1042            20            22            20            22
BM_string_strncmp/15/0/0             -0.0277         -0.0277            23            22            23            22
BM_string_strncmp/16/0/0             +0.0214         +0.0215            22            22            22            22
BM_string_strncmp/24/0/0             -0.1291         -0.1291            24            21            24            21
BM_string_strncmp/32/0/0             -0.0470         -0.0470            27            26            27            26
BM_string_strncmp/40/0/0             -0.0433         -0.0433            29            28            29            28
BM_string_strncmp/48/0/0             -0.0301         -0.0301            31            30            31            30
BM_string_strncmp/56/0/0             -0.0800         -0.0800            33            31            33            31
BM_string_strncmp/64/0/0             +0.0188         +0.0188            34            34            34            34
BM_string_strncmp/72/0/0             -0.0334         -0.0334            38            37            38            37
BM_string_strncmp/80/0/0             -0.0000         -0.0000            40            40            40            40
BM_string_strncmp/88/0/0             +0.0413         +0.0413            61            64            61            64
BM_string_strncmp/96/0/0             -0.0215         -0.0216            69            67            69            67
BM_string_strncmp/104/0/0            -0.0208         -0.0208            72            70            72            70
BM_string_strncmp/112/0/0            -0.0173         -0.0173            75            74            75            74
BM_string_strncmp/120/0/0            -0.0166         -0.0166            78            77            78            77
BM_string_strncmp/128/0/0            -0.0158         -0.0158            81            80            81            80
BM_string_strncmp/136/0/0            -0.0149         -0.0149            84            83            84            83
BM_string_strncmp/144/0/0            -0.0201         -0.0201            88            86            88            86
BM_string_strncmp/160/0/0            -0.0136         -0.0136            94            93            94            93
BM_string_strncmp/176/0/0            +0.0224         +0.0224            96            98            96            98
BM_string_strncmp/192/0/0            +0.0289         +0.0289           102           105           102           105
BM_string_strncmp/208/0/0            +0.0101         +0.0101           111           112           111           112
BM_string_strncmp/224/0/0            -0.0107         -0.0107           119           118           119           118
BM_string_strncmp/240/0/0            -0.0088         -0.0088           126           125           126           125
BM_string_strncmp/256/0/0            -0.0101         -0.0101           132           131           132           131
BM_string_strncmp/512/0/0            -0.0056         -0.0056           235           233           235           233
BM_string_strncmp/1024/0/0           -0.0030         -0.0030           439           437           439           437
BM_string_strncmp/8192/0/0           -0.0431         -0.0431          3799          3635          3799          3635
BM_string_strncmp/16384/0/0          -0.0069         -0.0069          6778          6732          6779          6732
BM_string_strncmp/32768/0/0          -0.0001         -0.0002         13405         13403         13405         13403
BM_string_strncmp/65536/0/0          +0.0005         +0.0005         26968         26981         26968         26981
BM_string_strncmp/131072/0/0         -0.0057         -0.0057         53959         53650         53958         53650
BM_string_strncmp/1/4/0              -0.1352         -0.1352            12            10            12            10
BM_string_strncmp/2/4/0              +0.0020         +0.0020            15            15            15            15
BM_string_strncmp/3/4/0              -0.1560         -0.1560            20            17            20            17
BM_string_strncmp/4/4/0              +0.0296         +0.0296            22            22            22            22
BM_string_strncmp/5/4/0              +0.0573         +0.0573            22            23            22            23
BM_string_strncmp/6/4/0              -0.0340         -0.0340            25            24            25            24
BM_string_strncmp/7/4/0              +0.0185         +0.0185            26            26            26            26
BM_string_strncmp/8/4/0              -0.0050         -0.0050            27            27            27            27
BM_string_strncmp/9/4/0              -0.1294         -0.1294            28            24            28            24
BM_string_strncmp/10/4/0             +0.0109         +0.0109            29            29            29            29
BM_string_strncmp/11/4/0             -0.0000         -0.0001            30            30            30            30
BM_string_strncmp/12/4/0             +0.0055         +0.0055            50            50            50            50
BM_string_strncmp/13/4/0             -0.0249         -0.0249            51            50            51            50
BM_string_strncmp/14/4/0             -0.0289         -0.0289            53            52            53            52
BM_string_strncmp/15/4/0             -0.0205         -0.0205            55            54            55            54
BM_string_strncmp/16/4/0             -0.4616         -0.4616            57            31            57            31
BM_string_strncmp/24/4/0             -0.4871         -0.4871            72            37            72            37
BM_string_strncmp/32/4/0             -0.5549         -0.5549            87            39            87            39
BM_string_strncmp/40/4/0             -0.5964         -0.5964           103            42           103            42
BM_string_strncmp/48/4/0             -0.6647         -0.6647           118            40           118            40
BM_string_strncmp/56/4/0             -0.6551         -0.6551           134            46           134            46
BM_string_strncmp/64/4/0             -0.6609         -0.6609           145            49           145            49
BM_string_strncmp/72/4/0             -0.5709         -0.5710           164            70           164            70
BM_string_strncmp/80/4/0             -0.5929         -0.5929           180            73           180            73
BM_string_strncmp/88/4/0             -0.6051         -0.6051           195            77           195            77
BM_string_strncmp/96/4/0             -0.6160         -0.6160           210            81           210            81
BM_string_strncmp/104/4/0            -0.6199         -0.6199           223            85           223            85
BM_string_strncmp/112/4/0            -0.6293         -0.6293           240            89           240            89
BM_string_strncmp/120/4/0            -0.6439         -0.6439           255            91           255            91
BM_string_strncmp/128/4/0            -0.6493         -0.6493           271            95           271            95
BM_string_strncmp/136/4/0            -0.6704         -0.6704           287            95           287            95
BM_string_strncmp/144/4/0            -0.6744         -0.6744           302            98           302            98
BM_string_strncmp/160/4/0            -0.6700         -0.6700           333           110           333           110
BM_string_strncmp/176/4/0            -0.6821         -0.6821           364           116           364           116
BM_string_strncmp/192/4/0            -0.6887         -0.6887           394           123           394           123
BM_string_strncmp/208/4/0            -0.6949         -0.6949           425           130           425           130
BM_string_strncmp/224/4/0            -0.7069         -0.7069           456           134           456           134
BM_string_strncmp/240/4/0            -0.7042         -0.7042           486           144           486           144
BM_string_strncmp/256/4/0            -0.7043         -0.7043           514           152           514           152
BM_string_strncmp/1/0/4              +0.0227         +0.0227            14            14            14            14
BM_string_strncmp/2/0/4              +0.0442         +0.0442            15            16            15            16
BM_string_strncmp/3/0/4              +0.5829         +0.5829            17            27            17            27
BM_string_strncmp/4/0/4              -0.1593         -0.1593            22            19            22            19
BM_string_strncmp/5/0/4              -0.0516         -0.0516            23            22            23            22
BM_string_strncmp/6/0/4              -0.1684         -0.1684            25            20            25            20
BM_string_strncmp/7/0/4              +0.0170         +0.0170            26            26            26            26
BM_string_strncmp/8/0/4              +0.0006         +0.0006            27            27            27            27
BM_string_strncmp/9/0/4              +0.1272         +0.1272            25            28            25            28
BM_string_strncmp/10/0/4             +0.0108         +0.0108            29            29            29            29
BM_string_strncmp/11/0/4             -0.0001         -0.0001            30            30            30            30
BM_string_strncmp/12/0/4             -0.3557         -0.3557            50            32            50            32
BM_string_strncmp/13/0/4             -0.3370         -0.3370            51            34            51            34
BM_string_strncmp/14/0/4             -0.3444         -0.3444            53            35            53            35
BM_string_strncmp/15/0/4             +0.0946         +0.0946            51            56            51            56
BM_string_strncmp/16/0/4             -0.5203         -0.5203            53            25            53            25
BM_string_strncmp/24/0/4             -0.6109         -0.6109            72            28            72            28
BM_string_strncmp/32/0/4             -0.6934         -0.6934            88            27            88            27
BM_string_strncmp/40/0/4             -0.6833         -0.6833           103            33           103            33
BM_string_strncmp/48/0/4             -0.6973         -0.6973           118            36           118            36
BM_string_strncmp/56/0/4             -0.7116         -0.7116           134            39           134            39
BM_string_strncmp/64/0/4             -0.6017         -0.6018           149            59           149            59
BM_string_strncmp/72/0/4             -0.6268         -0.6268           164            61           164            61
BM_string_strncmp/80/0/4             -0.6409         -0.6409           179            64           179            64
BM_string_strncmp/88/0/4             -0.6465         -0.6465           195            69           195            69
BM_string_strncmp/96/0/4             -0.6551         -0.6551           210            72           210            72
BM_string_strncmp/104/0/4            -0.6662         -0.6662           227            76           227            76
BM_string_strncmp/112/0/4            -0.6700         -0.6700           240            79           240            79
BM_string_strncmp/120/0/4            -0.6740         -0.6740           256            83           256            83
BM_string_strncmp/128/0/4            -0.6862         -0.6862           271            85           271            85
BM_string_strncmp/136/0/4            -0.6883         -0.6883           287            89           287            89
BM_string_strncmp/144/0/4            -0.7031         -0.7031           297            88           297            88
BM_string_strncmp/160/0/4            -0.6985         -0.6985           333           100           333           100
BM_string_strncmp/176/0/4            -0.7082         -0.7082           364           106           364           106
BM_string_strncmp/192/0/4            -0.7223         -0.7223           396           110           396           110
BM_string_strncmp/208/0/4            -0.7135         -0.7135           421           121           421           121
BM_string_strncmp/224/0/4            -0.7194         -0.7194           455           128           455           128
BM_string_strncmp/240/0/4            -0.7233         -0.7233           487           135           487           135
BM_string_strncmp/256/0/4            -0.7239         -0.7239           516           143           516           143
BM_string_strncmp/1/4/4              +0.0224         +0.0225            21            22            21            22
BM_string_strncmp/2/4/4              -0.0001         -0.0001            22            22            22            22
BM_string_strncmp/3/4/4              -0.0001         -0.0001            22            22            22            22
BM_string_strncmp/4/4/4              -0.0435         -0.0435            22            21            22            21
BM_string_strncmp/5/4/4              -0.0118         -0.0118            27            27            27            27
BM_string_strncmp/6/4/4              -0.0118         -0.0118            27            27            27            27
BM_string_strncmp/7/4/4              -0.0117         -0.0117            27            27            27            27
BM_string_strncmp/8/4/4              -0.0118         -0.0118            27            27            27            27
BM_string_strncmp/9/4/4              -0.0117         -0.0117            27            27            27            27
BM_string_strncmp/10/4/4             +0.1447         +0.1447            23            27            23            27
BM_string_strncmp/11/4/4             -0.0062         -0.0062            27            27            27            27
BM_string_strncmp/12/4/4             -0.0454         -0.0454            28            27            28            27
BM_string_strncmp/13/4/4             -0.1507         -0.1507            29            24            29            24
BM_string_strncmp/14/4/4             -0.0003         -0.0003            29            29            29            29
BM_string_strncmp/15/4/4             -0.0002         -0.0003            29            29            29            29
BM_string_strncmp/16/4/4             +0.0047         +0.0047            29            29            29            29
BM_string_strncmp/24/4/4             -0.0104         -0.0104            31            30            31            30
BM_string_strncmp/32/4/4             -0.0290         -0.0290            33            32            33            32
BM_string_strncmp/40/4/4             -0.0189         -0.0189            34            33            34            33
BM_string_strncmp/48/4/4             -0.0059         -0.0059            36            36            36            36
BM_string_strncmp/56/4/4             +0.0000         +0.0000            39            39            39            39
BM_string_strncmp/64/4/4             +0.0000         +0.0000            42            42            42            42
BM_string_strncmp/72/4/4             +0.0000         +0.0000            45            45            45            45
BM_string_strncmp/80/4/4             +0.0391         +0.0392            65            68            65            68
BM_string_strncmp/88/4/4             -0.0090         -0.0090            71            70            71            70
BM_string_strncmp/96/4/4             -0.0034         -0.0034            74            74            74            74
BM_string_strncmp/104/4/4            -0.0482         -0.0482            77            73            77            73
BM_string_strncmp/112/4/4            +0.0387         +0.0387            77            80            77            80
BM_string_strncmp/120/4/4            -0.0072         -0.0073            84            83            84            83
BM_string_strncmp/128/4/4            -0.0071         -0.0071            87            86            87            86
BM_string_strncmp/136/4/4            +0.0366         +0.0366            86            89            86            89
BM_string_strncmp/144/4/4            -0.0068         -0.0068            93            93            93            93
BM_string_strncmp/160/4/4            -0.0064         -0.0064           100            99           100            99
BM_string_strncmp/176/4/4            -0.0063         -0.0063           106           105           106           105
BM_string_strncmp/192/4/4            -0.0012         -0.0012           112           112           112           112
BM_string_strncmp/208/4/4            -0.0098         -0.0098           119           118           119           118
BM_string_strncmp/224/4/4            -0.0050         -0.0050           125           125           125           125
BM_string_strncmp/240/4/4            -0.0060         -0.0060           132           131           132           131
BM_string_strncmp/256/4/4            -0.0046         -0.0046           138           137           138           137

[1] Commit id: 26cc4faec37a55529e5d0a39949f7b6ec81008f9

Test: bionic tests and benchmarks on aarch64.
Change-Id: Ied579d2044b4092fc95fad486af6541d1eb71dc3
2018-08-21 19:50:09 +00:00
Adhemerval Zanella
b42ff1b5c3 [AArch64] Improve strcmp performance for misaligned strings
This patch was originally written by Siddhesh Poyarekar and pushed on
cortex-strings [1]. Replace the simple byte-wise compare in the
misaligned case with a dword compare with page boundary checks in
place. For simplicity its uses a 4K page boundary so that it does not
have to query the actual page size on the system.

Comparison on the default bionic and proposed optimized routines
shows the following performance improvements on A64 (using the
new proposed memcmp input data from test_strcmp.xml):

  - Small improvement for aligned arguments with sizes up to 56 bytes
    (from 10% to 20%).

  - Large improvements for unaligned arguments for small sizes (from
    3 to 256 bytes).

Benchmark                              Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------
BM_string_strcmp/1/0/0              +0.0034         +0.0034            11            11            11            11
BM_string_strcmp/2/0/0              +0.0000         +0.0000            11            11            11            11
BM_string_strcmp/3/0/0              -0.1726         -0.1726            11             9            11             9
BM_string_strcmp/4/0/0              -0.1726         -0.1726            11             9            11             9
BM_string_strcmp/5/0/0              -0.1726         -0.1726            11             9            11             9
BM_string_strcmp/6/0/0              -0.1719         -0.1719            11             9            11             9
BM_string_strcmp/7/0/0              -0.1724         -0.1724            11             9            11             9
BM_string_strcmp/8/0/0              -0.1718         -0.1718            11             9            11             9
BM_string_strcmp/9/0/0              -0.2008         -0.2008            16            13            16            13
BM_string_strcmp/10/0/0             -0.2008         -0.2008            16            13            16            13
BM_string_strcmp/11/0/0             -0.2040         -0.2040            16            13            16            13
BM_string_strcmp/12/0/0             -0.1991         -0.1991            16            13            16            13
BM_string_strcmp/13/0/0             -0.1997         -0.1997            16            13            16            13
BM_string_strcmp/14/0/0             -0.1988         -0.1989            16            13            16            13
BM_string_strcmp/15/0/0             -0.2006         -0.2006            16            13            16            13
BM_string_strcmp/16/0/0             -0.2043         -0.2043            16            13            16            13
BM_string_strcmp/24/0/0             -0.1927         -0.1927            18            15            18            15
BM_string_strcmp/32/0/0             -0.1743         -0.1743            20            17            20            17
BM_string_strcmp/40/0/0             -0.1427         -0.1427            22            19            22            19
BM_string_strcmp/48/0/0             -0.1053         -0.1053            24            22            24            22
BM_string_strcmp/56/0/0             -0.0805         -0.0805            26            24            26            24
BM_string_strcmp/64/0/0             -0.0454         -0.0454            28            27            28            27
BM_string_strcmp/72/0/0             -0.0303         -0.0303            30            29            30            29
BM_string_strcmp/80/0/0             -0.0111         -0.0111            32            32            32            32
BM_string_strcmp/88/0/0             -0.0004         -0.0004            34            34            34            34
BM_string_strcmp/96/0/0             -0.0058         -0.0058            37            37            37            37
BM_string_strcmp/104/0/0            +0.0000         +0.0000            40            40            40            40
BM_string_strcmp/112/0/0            -0.0457         -0.0457            61            58            61            58
BM_string_strcmp/120/0/0            -0.0486         -0.0487            61            58            61            58
BM_string_strcmp/128/0/0            -0.0499         -0.0499            64            61            64            61
BM_string_strcmp/136/0/0            -0.0529         -0.0529            66            63            66            63
BM_string_strcmp/144/0/0            -0.0492         -0.0492            69            66            69            66
BM_string_strcmp/160/0/0            -0.0459         -0.0459            74            71            74            71
BM_string_strcmp/176/0/0            -0.0400         -0.0401            79            76            79            76
BM_string_strcmp/192/0/0            -0.0378         -0.0378            85            81            85            81
BM_string_strcmp/208/0/0            -0.0009         -0.0009            89            89            89            89
BM_string_strcmp/224/0/0            -0.0003         -0.0003            95            95            95            95
BM_string_strcmp/240/0/0            -0.0320         -0.0320           100            96           100            96
BM_string_strcmp/256/0/0            -0.0303         -0.0304           105           102           105           102
BM_string_strcmp/512/0/0            -0.0171         -0.0171           187           183           187           183
BM_string_strcmp/1024/0/0           -0.0091         -0.0091           350           347           350           347
BM_string_strcmp/8192/0/0           -0.0030         -0.0031          2668          2660          2668          2660
BM_string_strcmp/16384/0/0          +0.0007         +0.0007          5449          5452          5448          5452
BM_string_strcmp/32768/0/0          +0.0635         +0.0635         10868         11558         10867         11557
BM_string_strcmp/65536/0/0          -0.0017         -0.0017         21824         21786         21822         21784
BM_string_strcmp/131072/0/0         +0.0012         +0.0012         43485         43536         43480         43532
BM_string_strcmp/1/4/0              +0.7630         +0.7630             7            12             7            12
BM_string_strcmp/2/4/0              +0.9265         +0.9265            12            23            12            23
BM_string_strcmp/3/4/0              -0.0000         -0.0000            14            14            14            14
BM_string_strcmp/4/4/0              +0.0372         +0.0372            19            19            19            19
BM_string_strcmp/6/4/0              -0.0921         -0.0921            20            19            20            19
BM_string_strcmp/7/4/0              -0.0291         -0.0291            19            19            19            19
BM_string_strcmp/8/4/0              +0.0648         +0.0648            20            22            20            22
BM_string_strcmp/9/4/0              +0.0001         -0.0055            22            22            22            22
BM_string_strcmp/10/4/0             -0.1924         -0.1924            23            19            23            19
BM_string_strcmp/11/4/0             -0.2347         -0.2347            24            19            24            19
BM_string_strcmp/12/4/0             -0.2738         -0.2739            26            19            26            19
BM_string_strcmp/13/4/0             -0.3804         -0.3804            42            26            42            26
BM_string_strcmp/14/4/0             -0.3581         -0.3582            41            26            41            26
BM_string_strcmp/15/4/0             -0.3905         -0.3905            43            26            43            26
BM_string_strcmp/16/4/0             -0.4068         -0.4068            44            26            44            26
BM_string_strcmp/24/4/0             -0.4917         -0.4917            57            29            57            29
BM_string_strcmp/32/4/0             -0.5607         -0.5607            70            31            70            31
BM_string_strcmp/40/4/0             -0.5940         -0.5940            82            33            82            33
BM_string_strcmp/48/4/0             -0.5303         -0.5302            95            45            95            45
BM_string_strcmp/56/4/0             -0.4975         -0.4975           108            54           108            54
BM_string_strcmp/64/4/0             -0.5167         -0.5167           121            58           121            58
BM_string_strcmp/72/4/0             -0.5325         -0.5325           133            62           133            62
BM_string_strcmp/80/4/0             -0.5523         -0.5523           146            65           146            65
BM_string_strcmp/88/4/0             -0.5686         -0.5686           159            69           159            69
BM_string_strcmp/96/4/0             -0.5815         -0.5815           172            72           172            72
BM_string_strcmp/104/4/0            -0.5931         -0.5931           185            75           185            75
BM_string_strcmp/112/4/0            -0.6046         -0.6046           197            78           197            78
BM_string_strcmp/120/4/0            -0.6113         -0.6113           210            82           210            82
BM_string_strcmp/128/4/0            -0.6186         -0.6186           223            85           223            85
BM_string_strcmp/136/4/0            -0.6278         -0.6278           237            88           237            88
BM_string_strcmp/144/4/0            -0.6410         -0.6410           253            91           253            91
BM_string_strcmp/160/4/0            -0.6506         -0.6506           280            98           280            98
BM_string_strcmp/176/4/0            -0.6593         -0.6593           304           104           304           104
BM_string_strcmp/192/4/0            -0.6647         -0.6647           330           111           330           111
BM_string_strcmp/208/4/0            -0.6741         -0.6741           357           116           357           116
BM_string_strcmp/224/4/0            -0.6761         -0.6761           381           123           381           123
BM_string_strcmp/240/4/0            -0.6824         -0.6824           406           129           406           129
BM_string_strcmp/256/4/0            -0.6846         -0.6846           432           136           432           136
BM_string_strcmp/1/0/4              +1.0024         +1.0024             7            14             7            14
BM_string_strcmp/2/0/4              +0.1591         +0.1591            12            14            12            14
BM_string_strcmp/3/0/4              -0.0015         -0.0015            14            14            14            14
BM_string_strcmp/4/0/4              -0.0809         -0.0809            15            14            15            14
BM_string_strcmp/5/0/4              -0.1535         -0.1536            17            14            17            14
BM_string_strcmp/6/0/4              -0.2111         -0.2111            18            14            18            14
BM_string_strcmp/7/0/4              -0.2650         -0.2650            19            14            19            14
BM_string_strcmp/8/0/4              -0.3118         -0.3118            20            14            20            14
BM_string_strcmp/9/0/4              -0.1741         -0.1740            22            18            22            18
BM_string_strcmp/10/0/4             -0.2201         -0.2201            23            18            23            18
BM_string_strcmp/11/0/4             -0.2610         -0.2610            24            18            24            18
BM_string_strcmp/12/0/4             -0.2987         -0.2987            26            18            26            18
BM_string_strcmp/13/0/4             -0.5748         -0.5748            42            18            42            18
BM_string_strcmp/14/0/4             -0.5796         -0.5796            43            18            43            18
BM_string_strcmp/15/0/4             -0.6167         -0.6167            47            18            47            18
BM_string_strcmp/16/0/4             -0.6303         -0.6303            49            18            49            18
BM_string_strcmp/24/0/4             -0.6557         -0.6557            61            21            61            21
BM_string_strcmp/32/0/4             -0.6612         -0.6612            70            24            70            24
BM_string_strcmp/40/0/4             -0.6812         -0.6813            82            26            82            26
BM_string_strcmp/48/0/4             -0.6974         -0.6974            95            29            95            29
BM_string_strcmp/56/0/4             -0.7151         -0.7151           108            31           108            31
BM_string_strcmp/64/0/4             -0.5717         -0.5717           121            52           121            52
BM_string_strcmp/72/0/4             -0.5927         -0.5927           134            54           134            54
BM_string_strcmp/80/0/4             -0.6004         -0.6004           146            58           146            58
BM_string_strcmp/88/0/4             -0.6145         -0.6145           159            61           159            61
BM_string_strcmp/96/0/4             -0.6287         -0.6287           172            64           172            64
BM_string_strcmp/104/0/4            -0.6351         -0.6351           185            67           185            67
BM_string_strcmp/112/0/4            -0.6423         -0.6423           197            71           197            71
BM_string_strcmp/120/0/4            -0.6489         -0.6489           210            74           210            74
BM_string_strcmp/128/0/4            -0.6578         -0.6578           223            76           223            76
BM_string_strcmp/136/0/4            -0.6597         -0.6597           236            80           236            80
BM_string_strcmp/144/0/4            -0.6674         -0.6674           250            83           250            83
BM_string_strcmp/160/0/4            -0.6751         -0.6751           274            89           274            89
BM_string_strcmp/176/0/4            -0.6798         -0.6798           300            96           300            96
BM_string_strcmp/192/0/4            -0.6873         -0.6855           327           102           325           102
BM_string_strcmp/208/0/4            -0.6903         -0.6903           351           109           351           109
BM_string_strcmp/224/0/4            -0.6907         -0.6907           376           116           376           116
BM_string_strcmp/240/0/4            -0.6897         -0.6897           402           125           402           125
BM_string_strcmp/256/0/4            -0.6937         -0.6937           427           131           427           131
BM_string_strcmp/1/4/4              +0.0009         +0.0009            14            14            14            14
BM_string_strcmp/2/4/4              -0.2229         -0.2229            14            11            14            11
BM_string_strcmp/3/4/4              -0.2256         -0.2256            14            11            14            11
BM_string_strcmp/4/4/4              -0.2241         -0.2240            14            11            14            11
BM_string_strcmp/5/4/4              -0.2220         -0.2220            20            15            20            15
BM_string_strcmp/6/4/4              -0.2267         -0.2267            20            15            20            15
BM_string_strcmp/7/4/4              -0.2228         -0.2227            20            15            20            15
BM_string_strcmp/8/4/4              -0.2219         -0.2219            20            15            20            15
BM_string_strcmp/9/4/4              -0.2220         -0.2220            20            15            20            15
BM_string_strcmp/10/4/4             -0.2227         -0.2227            20            15            20            15
BM_string_strcmp/11/4/4             -0.2210         -0.2210            20            15            20            15
BM_string_strcmp/12/4/4             -0.2224         -0.2224            20            15            20            15
BM_string_strcmp/13/4/4             -0.1778         -0.1778            21            17            21            17
BM_string_strcmp/14/4/4             -0.1863         -0.1863            21            17            21            17
BM_string_strcmp/15/4/4             -0.1780         -0.1780            21            17            21            17
BM_string_strcmp/16/4/4             +0.0031         +0.0031            21            21            21            21
BM_string_strcmp/24/4/4             +0.0041         +0.0041            24            24            24            24
BM_string_strcmp/32/4/4             -0.0001         -0.0000            25            25            25            25
BM_string_strcmp/40/4/4             +0.0016         +0.0016            26            26            26            26
BM_string_strcmp/48/4/4             +0.0001         +0.0001            28            28            28            28
BM_string_strcmp/56/4/4             -0.0001         -0.0001            30            30            30            30
BM_string_strcmp/64/4/4             -0.0342         -0.0342            32            31            32            31
BM_string_strcmp/72/4/4             -0.0186         -0.0186            34            34            34            34
BM_string_strcmp/80/4/4             +0.0004         +0.0004            36            36            36            36
BM_string_strcmp/88/4/4             -0.0000         -0.0000            39            39            39            39
BM_string_strcmp/96/4/4             -0.0510         -0.0510            62            59            62            59
BM_string_strcmp/104/4/4            -0.0502         -0.0502            63            60            63            60
BM_string_strcmp/112/4/4            -0.0490         -0.0490            65            62            65            62
BM_string_strcmp/120/4/4            -0.0387         -0.0387            67            65            67            65
BM_string_strcmp/128/4/4            -0.0426         -0.0426            70            67            70            67
BM_string_strcmp/136/4/4            -0.0408         -0.0408            73            70            73            70
BM_string_strcmp/144/4/4            -0.0194         -0.0194            75            74            75            74
BM_string_strcmp/160/4/4            -0.0035         -0.0035            81            81            81            81
BM_string_strcmp/176/4/4            -0.0001         -0.0001            86            86            86            86
BM_string_strcmp/192/4/4            -0.0002         -0.0002            91            91            91            91
BM_string_strcmp/208/4/4            -0.0335         -0.0335            96            93            96            93
BM_string_strcmp/224/4/4            -0.0314         -0.0314           101            98           101            98
BM_string_strcmp/240/4/4            -0.0303         -0.0303           106           103           106           103
BM_string_strcmp/256/4/4            -0.0288         -0.0288           111           108           111           108

[1] Commit id: f98f2a6780d686ca3d44f8011c7823d42d9b083a

Test: bionic tests and benchmarks on aarch64.
Change-Id: I75f8948782b8bd459d21f15e75e1d420905f5e5a
2018-08-21 17:48:52 +00:00
Adhemerval Zanella
4ab56af82d [AArch64] Optimize memcmp for medium to large sizes
This patch was originally written by Siddhesh Poyarekar and pushed on
cortex-strings [1]. This improved memcmp provides a fast path for
compares up to 16 bytes and then compares 16 bytes at a time, thus
optimizing loads from both sources.

Comparison on the default bionic and proposed optimized routines
shows the following performance improvements on A72 (using the
new proposed memcmp input data from test_memcmp.xml):

Benchmark                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------
BM_string_memcmp/1/0/0               -0.2074         -0.2074            15            12            15            12
BM_string_memcmp/2/0/0               -0.5193         -0.5193            31            15            31            15
BM_string_memcmp/3/0/0               -0.1291         -0.1291            19            17            19            17
BM_string_memcmp/4/0/0               -0.2889         -0.2889            17            12            17            12
BM_string_memcmp/5/0/0               -0.2606         -0.2606            15            11            15            11
BM_string_memcmp/6/0/0               -0.1656         -0.1655            17            14            17            14
BM_string_memcmp/7/0/0               -0.1721         -0.1721            19            15            19            15
BM_string_memcmp/8/0/0               -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/9/0/0               -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/10/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/11/0/0              -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/12/0/0              -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/13/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/14/0/0              -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/15/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/16/0/0              -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/24/0/0              -0.1209         -0.1209            15            13            15            13
BM_string_memcmp/32/0/0              -0.3228         -0.3228            20            13            20            13
BM_string_memcmp/40/0/0              -0.2937         -0.2937            22            15            22            15
BM_string_memcmp/48/0/0              -0.3299         -0.3299            23            15            23            15
BM_string_memcmp/56/0/0              -0.1845         -0.1845            24            20            24            20
BM_string_memcmp/64/0/0              -0.2247         -0.2247            26            20            26            20
BM_string_memcmp/72/0/0              -0.1947         -0.1947            27            22            27            22
BM_string_memcmp/80/0/0              -0.2275         -0.2275            28            22            28            22
BM_string_memcmp/88/0/0              -0.2360         -0.2360            29            22            29            22
BM_string_memcmp/96/0/0              -0.2675         -0.2675            31            22            31            22
BM_string_memcmp/104/0/0             -0.2559         -0.2559            32            24            32            24
BM_string_memcmp/112/0/0             -0.2787         -0.2786            33            24            33            24
BM_string_memcmp/120/0/0             -0.2599         -0.2599            34            25            34            25
BM_string_memcmp/128/0/0             -0.2860         -0.2860            35            25            35            25
BM_string_memcmp/136/0/0             -0.4708         -0.4708            53            28            53            28
BM_string_memcmp/144/0/0             -0.4719         -0.4719            53            28            53            28
BM_string_memcmp/160/0/0             -0.4680         -0.4680            56            30            56            30
BM_string_memcmp/176/0/0             -0.4645         -0.4645            60            32            60            32
BM_string_memcmp/192/0/0             -0.4641         -0.4641            63            34            63            34
BM_string_memcmp/208/0/0             -0.4555         -0.4555            66            36            66            36
BM_string_memcmp/224/0/0             -0.4558         -0.4557            69            38            69            38
BM_string_memcmp/240/0/0             -0.4534         -0.4534            72            40            72            40
BM_string_memcmp/256/0/0             -0.4463         -0.4463            75            42            75            42
BM_string_memcmp/512/0/0             -0.3077         -0.3077           126            88           126            88
BM_string_memcmp/1024/0/0            -0.3493         -0.3493           229           149           229           149
BM_string_memcmp/8192/0/0            -0.4173         -0.4173          1729          1007          1729          1007
BM_string_memcmp/16384/0/0           -0.3855         -0.3855          3377          2076          3377          2075
BM_string_memcmp/32768/0/0           -0.2968         -0.2968          6847          4815          6847          4814
BM_string_memcmp/65536/0/0           -0.2496         -0.2496         13715         10292         13714         10291
BM_string_memcmp/131072/0/0          -0.2676         -0.2676         27354         20033         27351         20031
BM_string_memcmp/262144/0/0          -0.2319         -0.2319         54604         41943         54598         41939
BM_string_memcmp/524288/0/0          -0.2359         -0.2359        109225         83460        109212         83449
BM_string_memcmp/1048576/0/0         -0.0439         -0.0439        423367        404791        423251        404686
BM_string_memcmp/2097152/0/0         -0.0023         -0.0024        762470        760701        761956        760122
BM_string_memcmp/512/4/4             -0.2853         -0.2853           125            89           125            89
BM_string_memcmp/1024/4/4            -0.3377         -0.3377           228           151           227           151
BM_string_memcmp/8192/4/4            -0.4083         -0.4083          1706          1009          1706          1009
BM_string_memcmp/16384/4/4           -0.3853         -0.3853          3376          2075          3376          2075
BM_string_memcmp/32768/4/4           -0.2974         -0.2974          6846          4810          6845          4810
BM_string_memcmp/65536/4/4           -0.2485         -0.2485         13619         10235         13618         10234
BM_string_memcmp/131072/4/4          -0.2387         -0.2387         27056         20597         27054         20595
BM_string_memcmp/512/4/0             -0.2898         -0.2898           123            88           123            88
BM_string_memcmp/1024/4/0            -0.3401         -0.3401           225           149           225           149
BM_string_memcmp/8192/4/0            -0.4167         -0.4167          1727          1007          1727          1007
BM_string_memcmp/16384/4/0           -0.3820         -0.3820          3384          2092          3384          2091
BM_string_memcmp/32768/4/0           -0.2535         -0.2535          6886          5141          6886          5140
BM_string_memcmp/65536/4/0           -0.1897         -0.1897         13850         11223         13849         11223
BM_string_memcmp/131072/4/0          -0.1972         -0.1972         27536         22106         27533         22104
BM_string_memcmp/512/0/4             -0.2854         -0.2854           125            89           125            89
BM_string_memcmp/1024/0/4            -0.3332         -0.3333           226           151           226           151
BM_string_memcmp/8192/0/4            -0.4199         -0.4199          1740          1009          1740          1009
BM_string_memcmp/16384/0/4           -0.3811         -0.3811          3383          2094          3383          2094
BM_string_memcmp/32768/0/4           -0.2409         -0.2409          6900          5238          6899          5237
BM_string_memcmp/65536/0/4           -0.1920         -0.1920         13922         11250         13921         11248
BM_string_memcmp/131072/0/4          -0.2029         -0.2029         27699         22079         27697         22077

I see similar improvements on A54 as well:

Benchmark                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------
BM_string_memcmp/1/0/0               -0.2074         -0.2074            15            12            15            12
BM_string_memcmp/2/0/0               -0.5193         -0.5193            31            15            31            15
BM_string_memcmp/3/0/0               -0.1291         -0.1291            19            17            19            17
BM_string_memcmp/4/0/0               -0.2889         -0.2889            17            12            17            12
BM_string_memcmp/5/0/0               -0.2606         -0.2606            15            11            15            11
BM_string_memcmp/6/0/0               -0.1656         -0.1655            17            14            17            14
BM_string_memcmp/7/0/0               -0.1721         -0.1721            19            15            19            15
BM_string_memcmp/8/0/0               -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/9/0/0               -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/10/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/11/0/0              -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/12/0/0              -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/13/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/14/0/0              -0.3048         -0.3048            15            10            15            10
BM_string_memcmp/15/0/0              -0.3040         -0.3040            15            10            15            10
BM_string_memcmp/16/0/0              -0.3041         -0.3041            15            10            15            10
BM_string_memcmp/24/0/0              -0.1209         -0.1209            15            13            15            13
BM_string_memcmp/32/0/0              -0.3228         -0.3228            20            13            20            13
BM_string_memcmp/40/0/0              -0.2937         -0.2937            22            15            22            15
BM_string_memcmp/48/0/0              -0.3299         -0.3299            23            15            23            15
BM_string_memcmp/56/0/0              -0.1845         -0.1845            24            20            24            20
BM_string_memcmp/64/0/0              -0.2247         -0.2247            26            20            26            20
BM_string_memcmp/72/0/0              -0.1947         -0.1947            27            22            27            22
BM_string_memcmp/80/0/0              -0.2275         -0.2275            28            22            28            22
BM_string_memcmp/88/0/0              -0.2360         -0.2360            29            22            29            22
BM_string_memcmp/96/0/0              -0.2675         -0.2675            31            22            31            22
BM_string_memcmp/104/0/0             -0.2559         -0.2559            32            24            32            24
BM_string_memcmp/112/0/0             -0.2787         -0.2786            33            24            33            24
BM_string_memcmp/120/0/0             -0.2599         -0.2599            34            25            34            25
BM_string_memcmp/128/0/0             -0.2860         -0.2860            35            25            35            25
BM_string_memcmp/136/0/0             -0.4708         -0.4708            53            28            53            28
BM_string_memcmp/144/0/0             -0.4719         -0.4719            53            28            53            28
BM_string_memcmp/160/0/0             -0.4680         -0.4680            56            30            56            30
BM_string_memcmp/176/0/0             -0.4645         -0.4645            60            32            60            32
BM_string_memcmp/192/0/0             -0.4641         -0.4641            63            34            63            34
BM_string_memcmp/208/0/0             -0.4555         -0.4555            66            36            66            36
BM_string_memcmp/224/0/0             -0.4558         -0.4557            69            38            69            38
BM_string_memcmp/240/0/0             -0.4534         -0.4534            72            40            72            40
BM_string_memcmp/256/0/0             -0.4463         -0.4463            75            42            75            42
BM_string_memcmp/512/0/0             -0.3077         -0.3077           126            88           126            88
BM_string_memcmp/1024/0/0            -0.3493         -0.3493           229           149           229           149
BM_string_memcmp/8192/0/0            -0.4173         -0.4173          1729          1007          1729          1007
BM_string_memcmp/16384/0/0           -0.3855         -0.3855          3377          2076          3377          2075
BM_string_memcmp/32768/0/0           -0.2968         -0.2968          6847          4815          6847          4814
BM_string_memcmp/65536/0/0           -0.2496         -0.2496         13715         10292         13714         10291
BM_string_memcmp/131072/0/0          -0.2676         -0.2676         27354         20033         27351         20031
BM_string_memcmp/262144/0/0          -0.2319         -0.2319         54604         41943         54598         41939
BM_string_memcmp/524288/0/0          -0.2359         -0.2359        109225         83460        109212         83449
BM_string_memcmp/1048576/0/0         -0.0439         -0.0439        423367        404791        423251        404686
BM_string_memcmp/2097152/0/0         -0.0023         -0.0024        762470        760701        761956        760122
BM_string_memcmp/512/4/4             -0.2853         -0.2853           125            89           125            89
BM_string_memcmp/1024/4/4            -0.3377         -0.3377           228           151           227           151
BM_string_memcmp/8192/4/4            -0.4083         -0.4083          1706          1009          1706          1009
BM_string_memcmp/16384/4/4           -0.3853         -0.3853          3376          2075          3376          2075
BM_string_memcmp/32768/4/4           -0.2974         -0.2974          6846          4810          6845          4810
BM_string_memcmp/65536/4/4           -0.2485         -0.2485         13619         10235         13618         10234
BM_string_memcmp/131072/4/4          -0.2387         -0.2387         27056         20597         27054         20595
BM_string_memcmp/512/4/0             -0.2898         -0.2898           123            88           123            88
BM_string_memcmp/1024/4/0            -0.3401         -0.3401           225           149           225           149
BM_string_memcmp/8192/4/0            -0.4167         -0.4167          1727          1007          1727          1007
BM_string_memcmp/16384/4/0           -0.3820         -0.3820          3384          2092          3384          2091
BM_string_memcmp/32768/4/0           -0.2535         -0.2535          6886          5141          6886          5140
BM_string_memcmp/65536/4/0           -0.1897         -0.1897         13850         11223         13849         11223
BM_string_memcmp/131072/4/0          -0.1972         -0.1972         27536         22106         27533         22104
BM_string_memcmp/512/0/4             -0.2854         -0.2854           125            89           125            89
BM_string_memcmp/1024/0/4            -0.3332         -0.3333           226           151           226           151
BM_string_memcmp/8192/0/4            -0.4199         -0.4199          1740          1009          1740          1009
BM_string_memcmp/16384/0/4           -0.3811         -0.3811          3383          2094          3383          2094
BM_string_memcmp/32768/0/4           -0.2409         -0.2409          6900          5238          6899          5237
BM_string_memcmp/65536/0/4           -0.1920         -0.1920         13922         11250         13921         11248
BM_string_memcmp/131072/0/4          -0.2029         -0.2029         27699         22079         27697         22077

[1] Commit id: f77e4c932b4fd65177b57dd5e220bd17fb4037d6

Test: bionic tests and benchmarks on aarch64.
Change-Id: I2791e2b20d1c0ad429e8e5a41d3e47b1ac02c921
2018-08-21 17:48:12 +00:00
Haibo Huang
8a0f0ed5e7 Make memcpy memmove
Bug: http://b/63992911
Test: Change BoardConfig.mk and compile for each variant
Change-Id: Ia0cc68d8e90e3316ddb2e9ff1555a009b6a0c5be
2018-06-11 18:12:45 +00:00
Haibo Huang
ece43e14c9 Use cortex-a53/bionic/memmove.S by default for arm64
cortex-a53/bionic/memmove.S looks like a more optimized version. It
should be used in most cases. It delegates small (<= 96 bytes) moves
to memcpy.

The only exception is denver64. It is using its own memcpy, which
doesn't allow overlap for < 96 bytes copies. Only for this variant we
need generic/bionic/memmove.S.

Benchmark result looks pretty close through (on marlin)

Before: using generic/bionic/memmove.S

-------------------------------------------------------------------
Benchmark                            Time           CPU Iterations
-------------------------------------------------------------------
BM_string_memcpy/8/0/0               6 ns          6 ns  108872005   1.15787GB/s
BM_string_memcpy/64/0/0              7 ns          7 ns  107387438   9.14365GB/s
BM_string_memcpy/512/0/0            21 ns         20 ns   34165353   23.2734GB/s
BM_string_memcpy/1024/0/0           40 ns         39 ns   17766657   24.2346GB/s
BM_string_memcpy/8192/0/0          311 ns        310 ns    2259904   24.6339GB/s
BM_string_memcpy/16384/0/0         616 ns        613 ns    1143027   24.8852GB/s
BM_string_memcpy/32768/0/0        1322 ns       1316 ns     530799   23.1835GB/s
BM_string_memcpy/65536/0/0        2672 ns       2661 ns     229638    22.937GB/s
BM_string_memcpy/131072/0/0       5379 ns       5357 ns     128316    22.788GB/s

After: using cortex-a53/bionic/memmove.S

-------------------------------------------------------------------
Benchmark                            Time           CPU Iterations
-------------------------------------------------------------------
BM_string_memcpy/8/0/0               6 ns          6 ns  116610749   1.24646GB/s
BM_string_memcpy/64/0/0              6 ns          6 ns  115634093   9.84708GB/s
BM_string_memcpy/512/0/0            21 ns         21 ns   34167322   22.8938GB/s
BM_string_memcpy/1024/0/0           39 ns         39 ns   17859445   24.3312GB/s
BM_string_memcpy/8192/0/0          311 ns        310 ns    2260192   24.6325GB/s
BM_string_memcpy/16384/0/0         610 ns        608 ns    1151889   25.0987GB/s
BM_string_memcpy/32768/0/0        1488 ns       1482 ns     532508   20.5988GB/s
BM_string_memcpy/65536/0/0        2421 ns       2411 ns     290502   25.3146GB/s
BM_string_memcpy/131072/0/0       5278 ns       5256 ns     132710   23.2234GB/s

Test: Build and benchmark on marlin
Bug: http://b/63992911
Change-Id: Id85961aca18ba841bcbcfe0d8b162843eab30584
2018-05-30 11:09:19 -07:00